CI3130 Introduction to Containers — the ADT's and Basic API's

CISC 3130
Data Structures
Array Implementations

Overview

In this lecture, we will concentrate on independent implementations of sequences using an underlying array container.

We concentrate on sequences, i.e., position-oriented containers, such as stack, queues, deques, and lists
The main focus is therefore on insert/remove
Membership testing is NOT a core operation

Arrays, Sequences, and Shifting

The main challenge for achieving an efficient implementation is the shifting of elements when inserting/removing anywhere other than the end (O(n) vs O(1)).

An insertion anywhere other than the end involves a shift to the right; similarly a removal a shift to the left:

As an aside, the order (left-to-right or right-to-left) in which you shift the elements is crucial to it working correctly and is different for two operations.

Using a Single-Index, 0-based Array

Container has first element at element 0 of the array; end of container is denoted by an index variable.

insert/remove operations at the end are O(1)
insert/remove operations at the beginning are O(n)
- This 'eliminates' FIFO queue (requires either insert or remove from the beginning) and deque
- This also 'eliminates' stack with top-of-stack at location 0
insert/remove operations in the middle are O(n)
- This 'eliminates' list

As a result, the only efficient container is the stack.

Stack

Have a top-of-stack (tos) index/pointer that acts similarly to size for Vector

By convention, tos indexes the actual top of the stack, not the next element
- Remember, size represents both the number of elements as well as the next empty position in the array to be filled by the sequence)
- Working 'backwards' from a single-element stack produces the initial (empty) state of tos being -1.
  - This is a useful technique for determining the proper initial value of an index of a container: start with the single-element case, then do a remove and see what happens.

increment tos for push; decrement it for pop
isEmpty is true when tos equal to -1

class ArrayStack<E> {
	void push(E value) {array[++tos] = value;}
	E pop() {
		if (isEmpty()) throw …
		return array[tos--];
	}
	boolean isEmpty() {return tos == -1;}

	private E [] array = (E)new Object[CAPACITY];
	private int tos = -1;

	private static int CAPACITY = 100;
}

removing from 'efficient' end; not much to improve
variations
- downward-growing stack
- pair of stacks growing towards each other

Using a Resizeable Class

We might consider replacing the built-in array with the resizeable Array class we discussed, providing capacity management

class ArrayStack<E> {
	void push(E value) {array.set(++tos, value;}
	}
	E pop() {
		if (isEmpty()) throw …
		array.get(tos--];
	}
	boolean isEmpty() {return tos == -1;}

	private Array arr = new Array<E>();
	private int tos = -1;
}

We can do this for all our array-based implementations without loss of functionality and the gain of automatic capacity management. The only downside is the loss of expressively of subscripting

Using an Array with Two-Indices

LIFO retrieval has all operations occurring at one end of the container; as seen above append and remove from end provide O(1) operations for this
However, FIFO and other sequences require more than simply one-ended operation.
We'll start with FIFO as that is the next simplest set of operations: insert at one end and remove from the other
- regardless of which operation is performed at the 'front', a shift will be necessary
- we typically insert at the end of the array and remove from the front, because of the analogy with 'standing on line' (get on at the back and get off the front).

Linear Queue

The basic idea is to have two index variables: front and rear indicating first and last elements in the container.

The whole point of the second index it to avoid having to constantly shift the array to maintain the first element at location 0.
To avoid shifting, have a front as well as a rear index/pointer. Again, by convention, front points at first (oldest) element, rear to last (newest).
Increment rear to add; increment front for remove
Working 'backwards' from a populated queue produces the initial (empty) state of front being 0, and rear being -1.
isEmpty generalizes to front 'crossing' rear, i.e. front > rear

insert/remove operations at the end are O(1)
remove operations at the beginning are O(1)
insert operation at front is still O(n)
- but we don't need an insert at front for FIFO queue
- but it eliminates deque
insert/remove operations in the middle are still O(n)
- This eliminates list

As a result, the only efficient container is the FIFO queue

FIFO Queue

To avoid shifting, have a front as well as a rear index/pointer. Again, by convention, front points at first (oldest) element, rear to last (newest).

Increment rear to add; increment front for remove
Working 'backwards' from a populated queue produces the initial (empty) state of front being 0, and rear being -1.
isEmpty generalizes to front 'crossing' rear, i.e. front > rear

class ArrayFifoQueue {
	void add(int value) {
		array[++rear] = value);
	}
	int remove() {
		if (isEmpty()) throw …
		return array[front++];
	}
	boolean isEmpty() {return front > rear;}

	private int [] array = new int[CAPACITY];
	private int front = 0, rear = -1;

	private static int CAPACITY = 100;
}

under the above implementation, the logical queue proper moves inexorably towards the right end of the array
eventually the right end of the queue ends up at the right end of the array
- this should trigger a potential overflow which needs to be addressed
- however there may be unused array elements to the left of the logical queue
- if we are dealing with a fixed capacity array, there is nowhere to insert an element and the container is thus 'full'
- otoh, if we are dealing with a resizeable array, this condition will trigger a resize, which is wasteful, given the available space in the array
- this situation can be taken a step further, and result in a paradoxical empty queue that's reported to be full (rear is at last element of array, and all elements have been removed).
realignment (i.e., left-shifting the entire queue back to the beginning of the array) allows continued use of the queue
amortization of housekeeping operations
- averaging out a slow (worst-case_ operation over the sequence of all operations
- e.g. the cost of resizing a container (the allocation and copy to the new underlying container) can be thought to be spread out over all the operations that did not require a resize.

Circular Queues / Ring Buffers

View the array as a circular entity (again, logical vs physical) … circular/ring buffer (Wiki entry)

class ArrayFifoQueue {
	void add(int value) {
		rear++;										// can be replaced with
		if (rear >= CAPACITY) rear = 0;		// (rear = rear + 1) % CAPACITY
		array[rear] = value);
	}
	int remove() {
		int hold = array[front];
		front++;										// can be replaced with
		if (front >= CAPACITY) front = 0;	// (front = front + 1) % CAPACITY
		return hold;
	}
	boolean isEmpty() {return front > rear;}

	private int [] array = new int[CAPACITY];
	private int front = 0, rear = CAPACITY-1;

	private static int CAPACITY = 100;
}

introduces unexpected semantics; in particular isEmpty breaks down; same test as an isFull would be were we testing for that
- and, in actuality, we are always testing for that; even in the context of a resizeable array, we need to know when to resize, i.e., when we are at capacity, i.e. when the current array is full.
either use a size, or keep one element unused
- size means updating with each operation (insertion and removal);
  - in particular removal requires updating both the starting index as well as the size
- unused element has slightly more involved logic
  - the updating of the front and rear pointers occur at different times wrt the placement/removal of the element

Deque

A simple-minded, non-circular approach would be to place the deque in the center of array to avoid shifting

and as with a FIFO queue, requires occasional re-shifting back to the center

But now that we have the notion of a ring buffer, that is a much more efficient approach to deque as well

there are now decrement operations (for addFirst and removeLast) paralleling the increments above
being an extremely efficient implementation, it is also used used to implement (via leveraging) LIFO (stack) and FIFO (queue) containers
coupled with a resizeable underlying array, we have the 'ultimate' implementation:
- O(1) for insertion/removal from either end
- O(1) amortized resizing … assuming a doubling of the capacity

List

Can't avoid the shifting here for internal adds/removes

one extremely specific optimization is to swap last element with deleted element to avoid opening a gap
- but this only works for a very specific case; i.e., remove, and only if order is of no consequence
in general, we need a different type of underlying container than an array
- that's next lecture

Priority Queue and the Associatives

Priority Queue and the associatives suffer from the search overheard due to their being value-based.
One 'simple-minded' approach is to used sorted lists,
- but that merely shifts the O(n) cost burden from the remove to the insertion
- to get really efficient performance we again have to move away from a simple array-based implementation, and as with lists, move to a different sort of underlying container.
- that will come much later

Summary

The intuitive, straightforward way to implement the containers is using an array as the underlying container
The primary challenge to using an array for the underlying container of a sequence is the shifting necessary
stacks do not suffer from a shifting issue — insertion and removal can be performed at the end of the array
FIFO queue requires one operation (typically removal) be performed at the front of the array, necessitating a shift
Deques, providing both insertion and removal from both ends have the same problem
To eliminate the constant shifting, a second index representing the front of the data structure is introduced
Having a second index introduces several issues
- proper initialization and maintenance of the indexes
  - one way to figure it out is to look at the situation where there is one element and see what the removal code does to the indexes (i.e., after removal the structure is now empty, i.e., at its initial state)
- situations where the rear index reaches the end of the array, yet there is space to the left (i.e. in front) of the front index
The capacity issue can be addressed by:
- realigning the data structure back to its initial position; this involves shifting the elements and modifying the indexes accordingly
- the cost of such an operation can be viewed as a worst-case slow operation that happens occasionally (no more than once per the capacity of the array)
- alternatively we can use amortization analysis to average out this cost on a per operation basis; this would not be appropriate for a real-time system
An alternative to realignment is to view the array as a circular (ring) buffer.
A basic challenge to using a ring buffer is distinguishing between full and empty; the same condition involving the two indices (rear immediately before front) corresponds to both full and empty
This can be resolved by either:
- keeping one element of the array unused (thus full would be rear being two elements in front of the front index)
- maintaining a size value; this would keeping both the index and a size updated
When realignment is used with a deque, we typically place it initially in the center of the array to avoid shifting
Since FIFO queue's is a direct subset of those of deque with no unnecessary overhear imposed by FIFO using deque's operations, is makes sense to leverage deque to get FIFO queu
The same argument goes for stack
All of the above addresses operations at the ends of the data structure, and thus does not apply to list; to avoid the shift overhead, we will have to look elsewhere (or live with the shifting)
Priority queues and associatives have a similar problem with array and will require alternative underlying containers as well

CISC 3130 Data Structures Array Implementations