CISC 3130
Data Structures
Array Implementations
Overview
In this lecture, we will concentrate on independent implementations of sequences using an underlying array container.
- We concentrate on sequences, i.e., position-oriented containers, such as stack, queues, deques, and lists
- The main focus is therefore on insert/remove
- Membership testing is NOT a core operation
Arrays, Sequences, and Shifting
The main challenge for achieving an efficient implementation is the shifting of elements when inserting/removing anywhere other than the end (O(n) vs O(1)).
An insertion anywhere other than the end involves a shift to the right; similarly a removal a shift to the left:
- As an aside, the order (left-to-right or right-to-left) in which you shift the elements is crucial to it working correctly and
is different for two operations.
Using a Single-Index, 0-based Array
Container has first element at element 0 of the array; end of container is denoted by an index variable.
- insert/remove operations at the end are O(1)
- insert/remove operations at the beginning are O(n)
- This 'eliminates' FIFO queue (requires either insert or remove from the beginning) and deque
- This also 'eliminates' stack with top-of-stack at location 0
- insert/remove operations in the middle are O(n)
As a result, the only efficient container is the stack.
Stack
Have a top-of-stack (tos) index/pointer that acts similarly to size
for Vector
- By convention, tos indexes the actual top of the stack, not the next element
- Remember,
size
represents both the number of elements as well as the next empty position in the array to be filled by the sequence)
- Working 'backwards' from a single-element stack produces the initial (empty) state of tos being -1.
- This is a useful technique for determining the proper initial value of an index of a container: start with the single-element case, then
do a remove and see what happens.
- increment tos for push; decrement it for pop
- isEmpty is true when tos equal to -1
class ArrayStack<E> {
void push(E value) {array[++tos] = value;}
E pop() {
if (isEmpty()) throw …
return array[tos--];
}
boolean isEmpty() {return tos == -1;}
private E [] array = (E)new Object[CAPACITY];
private int tos = -1;
private static int CAPACITY = 100;
}
- removing from 'efficient' end; not much to improve
- variations
- downward-growing stack
- pair of stacks growing towards each other
Using a Resizeable Class
We might consider replacing the built-in array with the resizeable Array
class we discussed, providing capacity management
class ArrayStack<E> {
void push(E value) {array.set(++tos, value;}
}
E pop() {
if (isEmpty()) throw …
array.get(tos--];
}
boolean isEmpty() {return tos == -1;}
private Array arr = new Array<E>();
private int tos = -1;
}
- We can do this for all our array-based implementations without loss of functionality and the gain of
automatic capacity management. The only downside is the loss of expressively of subscripting
Using an Array with Two-Indices
- LIFO retrieval has all operations occurring at one end of the container; as seen above append and remove from end provide O(1) operations for this
- However, FIFO and other sequences require more than simply one-ended operation.
- We'll start with FIFO as that is the next simplest set of operations: insert at one end and remove from the other
- regardless of which operation is performed at the 'front', a shift will be necessary
- we typically insert at the end of the array and remove from the front, because of the analogy with 'standing on line' (get on at the
back and get off the front).
Linear Queue
The basic idea is to have two index variables: front and rear indicating first and last elements in the container.
- The whole point of the second index it to avoid having to constantly shift the array to maintain the first element at location 0.
- To avoid shifting, have a front as well as a rear index/pointer. Again, by convention, front points at first (oldest) element, rear to last (newest).
- Increment rear to add; increment front for remove
- Working 'backwards' from a populated queue produces the initial (empty) state of front being 0, and rear being -1.
- isEmpty generalizes to front 'crossing' rear, i.e. front > rear
- insert/remove operations at the end are O(1)
- remove operations at the beginning are O(1)
- insert operation at front is still O(n)
- but we don't need an insert at front for FIFO queue
- but it eliminates deque
- insert/remove operations in the middle are still O(n)
As a result, the only efficient container is the FIFO queue
FIFO Queue
To avoid shifting, have a front as well as a rear index/pointer. Again, by convention, front points at first (oldest) element, rear to last (newest).
- Increment rear to add; increment front for remove
- Working 'backwards' from a populated queue produces the initial (empty) state of front being 0, and rear being -1.
- isEmpty generalizes to front 'crossing' rear, i.e. front > rear
class ArrayFifoQueue {
void add(int value) {
array[++rear] = value);
}
int remove() {
if (isEmpty()) throw …
return array[front++];
}
boolean isEmpty() {return front > rear;}
private int [] array = new int[CAPACITY];
private int front = 0, rear = -1;
private static int CAPACITY = 100;
}
- under the above implementation, the logical queue proper moves inexorably towards the right end of the array
- eventually the right end of the queue ends up at the right end of the array
- this should trigger a potential overflow which needs to be addressed
- however there may be unused array elements to the left of the logical queue
- if we are dealing with a fixed capacity array, there is nowhere to insert an element and the container is thus 'full'
- otoh, if we are dealing with a resizeable array, this condition will trigger a resize, which is wasteful, given the available space in the array
- this situation can be taken a step further, and result in a paradoxical empty queue that's reported to be full (rear is at last element of array, and all
elements have been removed).
- realignment (i.e., left-shifting the entire queue back to the beginning of the array) allows continued use of the queue
- amortization of housekeeping operations
- averaging out a slow (worst-case_ operation over the sequence of all operations
- e.g. the cost of resizing a container (the allocation and copy to the new underlying container) can
be thought to be spread out over all the operations that did not require a resize.
Circular Queues / Ring Buffers
View the array as a circular entity (again, logical vs physical) … circular/ring buffer (Wiki entry)
class ArrayFifoQueue {
void add(int value) {
rear++; // can be replaced with
if (rear >= CAPACITY) rear = 0; // (rear = rear + 1) % CAPACITY
array[rear] = value);
}
int remove() {
int hold = array[front];
front++; // can be replaced with
if (front >= CAPACITY) front = 0; // (front = front + 1) % CAPACITY
return hold;
}
boolean isEmpty() {return front > rear;}
private int [] array = new int[CAPACITY];
private int front = 0, rear = CAPACITY-1;
private static int CAPACITY = 100;
}
- introduces unexpected semantics; in particular
isEmpty
breaks down; same test as an isFull
would be were we testing for that
- and, in actuality, we are always testing for that; even in the context of a resizeable array, we need to know when to resize, i.e., when we are at capacity, i.e.
when the current array is full.
- either use a size, or keep one element unused
- size means updating with each operation (insertion and removal);
- in particular removal requires updating both the starting index as well as the size
- unused element has slightly more involved logic
- the updating of the front and rear pointers occur at different times wrt the placement/removal of the element
Deque
A simple-minded, non-circular approach would be to place the deque in the center of array to avoid shifting
- and as with a FIFO queue, requires occasional re-shifting back to the center
But now that we have the notion of a ring buffer, that is a much more efficient approach to deque as well
- there are now decrement operations (for
addFirst
and removeLast
) paralleling the increments above
- being an extremely efficient implementation, it is also used used to implement (via leveraging) LIFO (stack) and FIFO (queue) containers
- coupled with a resizeable underlying array, we have the 'ultimate' implementation:
- O(1) for insertion/removal from either end
- O(1) amortized resizing … assuming a doubling of the capacity
List
Can't avoid the shifting here for internal adds/removes
- one extremely specific optimization is to swap last element with deleted element to avoid opening a gap
- but this only works for a very specific case; i.e., remove, and only if order is of no consequence
- in general, we need a different type of underlying container than an array
Priority Queue and the Associatives
- Priority Queue and the associatives suffer from the search overheard due to their being value-based.
- One 'simple-minded' approach is to used sorted lists,
- but that merely shifts the O(n) cost burden from the remove to the insertion
- to get really efficient performance we again have to move away from a simple array-based implementation, and
as with lists, move to a different sort of underlying container.
- that will come much later
Summary
- The intuitive, straightforward way to implement the containers is using an array as the underlying container
- The primary challenge to using an array for the underlying container of a sequence is the shifting necessary
- stacks do not suffer from a shifting issue — insertion and removal can be performed at the end of the array
- FIFO queue requires one operation (typically removal) be performed at the front of the array, necessitating a shift
- Deques, providing both insertion and removal from both ends have the same problem
- To eliminate the constant shifting, a second index representing the front of the data structure is introduced
- Having a second index introduces several issues
- proper initialization and maintenance of the indexes
- one way to figure it out is to look at the situation where there is one element and see what the removal code does to the indexes
(i.e., after removal the structure is now empty, i.e., at its initial state)
- situations where the rear index reaches the end of the array, yet there is space to the left (i.e. in front) of the front index
- The capacity issue can be addressed by:
- realigning the data structure back to its initial position; this involves shifting the elements and modifying the indexes
accordingly
- the cost of such an operation can be viewed as a worst-case slow operation that happens occasionally (no more than once per the capacity of the array)
- alternatively we can use amortization analysis to average out this cost on a per operation basis; this would not be appropriate for
a real-time system
- An alternative to realignment is to view the array as a circular (ring) buffer.
- A basic challenge to using a ring buffer is distinguishing between full and empty; the same condition involving the two indices (rear immediately before front) corresponds to both full and empty
- This can be resolved by either:
- keeping one element of the array unused (thus full would be rear being two elements in front of the front index)
- maintaining a size value; this would keeping both the index and a size updated
- When realignment is used with a deque, we typically place it initially in the center of the array to avoid shifting
- Since FIFO queue's is a direct subset of those of deque with no unnecessary overhear imposed by FIFO using deque's operations, is makes sense to
leverage deque to get FIFO queu
- The same argument goes for stack
- All of the above addresses operations at the ends of the data structure, and thus does not apply to list; to avoid the shift overhead, we will have to look elsewhere (or
live with the shifting)
- Priority queues and associatives have a similar problem with array and will require alternative underlying containers as well