CISC 3130 Overview of Data Structures

Topics

Data structures
Containers
Static and dynamic fixed capacity
Arrays and vectors

Data Structures

From Wikipedia:

In computer science, a data structure is a data organization, management, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, and the functions or operations that can be applied to the data.

Is an `int` or `double` a Data Structure?

They're composed of smaller bits in a format that is chosen for efficient access and operation
However, the actual access and operation is performed at the hardware level — as opposed to being designed/implemented by the programmer — so we don't consider them data structures per se.

Is a `Student`, `BankAccount`, or `Customer` Class a Data Structure?

Technically yes, and definitely more so than the above numeric types.
They are constructed using the class mechanism which allows for the 'packaging' of various pieces of data (attributes) of a modelled object, and thus form a structure of data, corresponding to the information maintained for a student, a bank account, and a customer respectively.
- As an aside, strictly speaking, the class is not a data structure, rather a specification for the creation of objects (instances) that form the structure.
  - We will often be pedantic in this manner; proper use of the terminology of classes and object is often important in our discussion
However, these structures are 'boring' in the sense that their specification and methods are 'one-offs' in the sense that they are for a particular application,rather than having universal application.

Containers

Again, from Wikipedia:

In computer science, a container is a class or a data structure whose instances are collections of other objects. In other words, they store objects in an organized way that follows specific access rules.

Containers are more interesting in that they have a universal application: storing (and subsequently retrieving) other objects.
Furthermore, efficient storage/retrieval (and iteration) are both key and lead to very interesting algorithms and structures
They are thus worthy of in-depth study

Thus, a course in data structures is typically an investigation into containers.

In some sense, this course is your first in studying computer science rather than simply a programming language. We will be taking most of the material you have covered in the previous courses and applying it to the study, design, and implementation of containers.

So an Array Must be a Container and a Data Structure

Yes it is, and indeed, it is the fundamental homogeneous container/data structure found in hardware and most programming languages
- Homogeneous means that all the elements of the array are of the same type
  - This is in contract with a heterogeneous container in which the various elements can be of different types (the class mechanism permits the creation of heterogeneous structures, e.g. a Person object with a string for a name and an integer for an age).
However, like the numeric examples at the beginning of this discussion, the operations upon the array (subscripting) are performed below the level of the programmer.
Thus, while definitely in the category of data structure/container, arrays are of little interest to us except as a foundational container upon which we will build other containers with various storage/retrieval properties.
- We will see that the combination of homogeneous (e.g. array) and heterogeneous (e.g. class definition) provides us with the basic mechanism for creating arbitrary containers.

Arrays

Before we move on from the built-in array, let's examine it's basic properties, operations, and limitations.

Element: an array, like all containers, holds other objects, known as the elements of the container
- Each element has a unique position, known as its index or subscript within the array; most modern languages use 0 as the first position, and thus the last element is at index #elements - 1
Homogeneous: the elements of an array are all of the same type; in Java the type is supplied at the time of declaration of the corresponding reference variable as well as when the array object is created., e.g. int [] arr = new int[100];
Fixed size: the number of elements of an array is specified at object creation time, and cannot change (although the value can be calculated at run time), again new int[100]
- however, the reference variable referring to the array can be made to refer to a different array object; this will be the basis of 'growable' containers, which we introduce in the next lecture.
Contiguous memory: the elements of the array are guaranteed to follow each other in memory, with element 0 first, followed by element 1, etc
Subscripting: the basic operation of storage and retrieval is subscripting, i.e., the specification of the element to be stored to and retrieved from. In most languages, this operator is written as [i] (upon occasion (i)) with i being the element in question.
- It is the contiguous memory property of the array that makes subscripting a highly efficient operation; accessing the millionth element of an array is essentially as efficient as the first.

Specific to Java:

length: the number of elements allocated to the array can be obtained via this integer value. Note this is not a method, but rather a read-only value field associated with the array
Bounds-checking: attempting to access an element whose index is outside the range of the fixed-size of the array (i.e., 0 .. length-1) causes an exception to be thrown.

Arrays in Java

The primitive array does not supply a length field, nor perform bounds-checking; these are both artifices provided by the Java runtime
- For example, C/C++ provides neither of those two facilities for their array type .
In fact, an array in Java is not considered a primitive type (as is the case for C/C++), but is rather an object, which implies runtime support.
In this sense, we could say Java's array type is more of a data structure (in the sense we are using the term) that a primitive array.
But again, as this support is coming below our radar (i.e., provided by the runtime rather than our own code, we will treat the array as a fundamental container
As an aside, the properties of the array (in particular the mechanics of subscripting) are covered in more detail in an architecture course.

A Class Version of Java's Array

Here is the Java built-in array, implemented as a class. The idea here is:

To present an example of a container class
- In this case we use a familiar container … the array
To emphasize the absence of operators when dealing with classes
- Java does not support the definition of operators for classes, thus we have to replace the subscript operator with get/set methods
To introduce the notion of capacity
To show a toString method for a container

class Array {
	Array(int capacity) {arr = new int[capacity];}
	Array() {this(DEFAULT_CAPACITY);}

	int get(int index) {return arr[index];}

	void set(int index, int value) {arr[index] = value;}

	int length() {return arr.length;}

	public String toString() {
		String result = "{";
		for (int i = 0; i < length(); i++)
			result += arr[i] + (i < length()-1 ? ", " : "");
		result += "}";
		return result;
	}

	int [] arr;
	static final int DEFAULT_CAPACITY = 10;
}

Notes

As mentioned above, this is simply an illustration of what a typical container class will look like
Note the use of the built-in array to store the elements of this container.
- The use of an object as an instance variable of another class is known as composition and we will have much more to say about this later.
- The container instance variable, i.e., the object that will hold the actual elements (in this case the built-in array) is often called the underlying container
Changes in behavior from the built-in array:
- Little is added in the way of functionality:
  - a toString method
- Some loss of expressivity:
  - subscripting is gone: no operator symbols for classes; replaced by get/set methods
  - length field replaced by length method
One can make an argument in either direction whether this is truly a data structure
- While it does package data within it, it's nothing more than a single item — the underlying array
- There's no additional state (variables) to provide additional logic or storage/retrieval semantics (as is the case i
- OTOH, there is the additional of the toString
Notice that get/set, and length all achieve their semantics by calling corresponding operations at the (built-in) array level.
- This is known as delegation, and we will have a lot to say about it later as well.
bounds checking still occurs (implicitly) at the array level
Note the use of the this constructor mechanism in the default, overloaded constructor that allows us to take advantage of the logic of the single-argument constructor
- Having one method exploit the logic of another method is a technique known as leveraging and is related to the notions of delegation mentioned above and and wrapper classes mentioned below. We will discuss them in detail an upcoming lecture.
The conditional expression (i < length()-1 ? ", " : "") in the String method is a technique sometimes called comma-logic that allows on to print a sequence of comma-separated valued, making sure no comma is output after the last value (or before the first)
Finally, note the static default CAPACITY value (and its naming convention of all caps)
- Quick overview of
  — stylistic conventions for handling multi-word identifiers
  - camelCase (lowerCamelCase): variables and methods in Java
  - PascalCase (UpperCamelCase): classes and interfaces in Java
  - snake_case: most identifiers in C/C++, file names in Linux
  - UPPER_SNAKE_CASE: constants (final) in Java
  - kebab-case: identifiers in HTML
Overall, there is little advantage (aside from the availability of a toString) to using this class version as opposed to the built-in array object
The overall effect is that we have encased the built-in array within a class. We sometimes call a class with this sort of functionality a wrapper class.
- This is somewhat related to the notion of delegation above; and again, we will discuss this soon
- There is also a notion of a wrapper method which you may have run across in the context of recursion and array processing.

A Demo App for the Array Class

Here is a Java application (i.e., a class with a main method that uses the Arrayy class defined above:

public class ArrayApp {
	public static void main(String [] args) {
		Array a = new Array();
		System.out.println("length: " + a.length());

		for (int i = 0; i < a.length(); i++)
			a.set(i, i * 10);
		System.out.println("a: " + a);
		System.out.println("length: " + a.length());

		for (int i = 0; i < a.length(); i++)
			a.set(i, a.get(i)+1);
		System.out.println("a: " + a);
	}
}

Notes

Again, this will be a template of sorts for writing applications that illustrate the use of a container
You can add an out-of bounds reference (e.g. a[-1] or a[a.length] to see the ArrayIndexOutOfBoundsException.
- Again, this is generated by the Java runtime, not our code
The above app simply illustrates (demos) the capabilities of our implementation
Each method/constructor was called to show it in action
In particular, there is no attempt to test the correctness of the container's logic nor to attempt to 'break it'
- We will address testing later on

Here is the output of the app:

length: 10
a: {0, 10, 20, 30, 40, 50, 60, 70, 80, 90}
length: 10
a: {1, 11, 21, 31, 41, 51, 61, 71, 81, 91}

A First Container: Vector (Partially Populated Array)

Arrays are typically populated by reading in (or calculating) data items and adding them to the end of the array.

Unless one knows in advance the number of items, one cannot allocate an array of the exact size.
- Furthermore, even if one knows the exact number of initial items, it is often the case that one might wish to subsequently add or remove items.

To solve this problem, we usually employ partially-populated arrays rather than an array of the exact number of elements (i.e., a fully-populated array).

A partially-populated array consists of an array together with an integer variable, often named size.
An additional operation — let's call it add — append adds an element to the next available index of the array.
- size is used to maintain the location of the next available element
  - size is also the number of elements placed into the array
The only elements that are considered part of the container are those in the range 0 .. size-1
- The semantics of the container thus uses both the array and the size variable. Packaging the two together in a class and supplying methods to achieve this semantics is what we mean by a data structure.

The above behavior is a subset of a more substantial container known as a vector, list, or sequence (In Java, the corresponding class is the ArrayList. We will be investigating it in more detail later.

Physical and Logical Size (Capacity vs Size)

capacity/physical size: number of elements the array was created with
size/logical size: number of elements assigned values
We sometime refer to the physical container (the array as created) vs the logical container (the filled portion of the array.
- The former has bounds 0 … arr.length-1; whereas the latter has bounds 0 … size-1
The only elements that contain meaningful data are those in the range 0 … size-1.
- Thus, one builds up the (logical) portion of the container by appending values (using add), and then is free to get and/or set anywhere in that range

Vector Version 1: Static (Compile-time) Capacity

We're actually going to take a step backwards here vis-a-vis the above Array class: we're going to start of with a container whose capacity is hardcoded into the class (rather than specified in the constructor as we did with Array).

We're doing this to allow us to initiate a discussion about capacity, i,.e, the allocated size / number of elements of a container.

class Vector {
	int get(int index) {return arr[index];}

	void set(int index, int value) {arr[index] = value;}

	void add(int value) {arr[size++] = value;}

	int size() {return size;}

	public String toString() {
		String result = "{";
		for (int i = 0; i < size; i++)
			result += arr[i] + (i < size-1 ? ", " : "");
		result += "}";
		return result;
	}

	int [] arr = new int[CAPACITY];
	int size = 0;
	static final int CAPACITY = 10;
}

Notes

Note the lack of any constructor; none is necessary since the array's capacity is fixed in the code
By the same token, there is no need for a capacity instance variable (nor a DEFAULT_CAPACITY VALUE)
- The introduction of aCAPACITY static (class) variable is simply good programming practice (avoid magic numbers in the code proper)
- In general, I will be choosing relatively small capacities in the notes to allow for manageable output of the containers.
The use of an increment (or equivalently a decrement) in the subscript in the add method is one of the few contexts in which this operator might be used without overly being criticized (though there are those who would criticize this usage as well).
- Recall the semantics of the increment/decrement operators:
  - x++first saves the value of x as the value of the operation, and then performs the increment
  - ++x first performs the increment and then uses the incremented value as the result of the operation
    - The way to remember this is to remember which appears first in the operation: if the variable appears first (x++) use it's value before the operation; if the operator is first, apply it first and then use the value.
- Normally, use of increment/decrement in a larger expression is heavily frowned upon; however given that this is basically a standalone expression within the subscript, many will consider its use in this acceptable
  - I like it because it allows me to reduce the method to a single line of code which further allows the entire method and header to be placed on a single line.
The standard Java conventions for containers is to use the name size for the method returning the number of elements (rather than length). As it turns out, Java permits the use of the same name for both an instance variable and instance method in the same class.
- Such usage is often frowned upon as well, but is accepted in the container classes because of legacy coniderations
This implementation is actually incorrect, i.e., it allows errors that can easily corrupt the container.
- We will address and correct this in the next lecture

Here is the app:

public class VectorApp {
	public static void main(String [] args) {
		Vector v = new Vector();
		System.out.println("size: " + v.size());

		for (int i = 0; i < Vector.CAPACITY; i++) {
			v.add(i * 10);
			System.out.println(v + " (" + v.size() + ")");
		}

		for (int i = 0; i < v.size(); i++)
			v.set(i, v.get(i)+1);
		System.out.println("v: " + v);
	}
}

Notes

Not much different than ArrayApp except we get to make use for the add method, which handles the append for us (in ArrayApp we needed to pass the proper index (in the form of i).
Note the access to the static CAPACITY variable in the app (in order to know how many elements we can add).
- We will soon discuss whether the capacity should be visible to the user
Note that the app restricts itself to only getting/setting those elements that were appended using add. This is in contrast to our Array (or the built-in array) where we had no such constraint (although we did not arbitrarily scatter values through the array, we could have).
- The next lecture will discuss how to enforce staying within the logical bounds of the vector.

size: 0
{0} (1)
{0, 10} (2)
{0, 10, 20} (3)
{0, 10, 20, 30} (4)
{0, 10, 20, 30, 40} (5)
{0, 10, 20, 30, 40, 50} (6)
{0, 10, 20, 30, 40, 50, 60} (7)
{0, 10, 20, 30, 40, 50, 60, 70} (8)
{0, 10, 20, 30, 40, 50, 60, 70, 80} (9)
{0, 10, 20, 30, 40, 50, 60, 70, 80, 90} (10)
v: {1, 11, 21, 31, 41, 51, 61, 71, 81, 91}

Notes

We output the vector as it grows together with the value of size
- Our demo apps should 'show off' the capabilities of our container; there was no need to display a capacity (it was fixed) or size (there was none) in the ArrayApp

Vector Version 2: Dynamic (Runtime) Capacity

class Vector {
	Vector(int capacity) {arr = new int[capacity];}
	Vector() {this(DEFAULT_CAPACITY);}

	int get(int index) {return arr[index];}

	void set(int index, int value) {arr[index] = value;}

	void add(int value) { arr[size++] = value;}

	int getCapacity() {return arr.length;}
	int size() {return size;}

	public String toString() {
		String result = "{";
		for (int i = 0; i < size; i++)
			result += arr[i] + (i < size-1 ? ", " : "");
		result += "}";
		return result;
	}

	int [] arr;
	int size = 0;
	static final int DEFAULT_CAPACITY = 10;
}

Notes

We now need a constructor to allow us to specify the capacity of our vector.
- The array now needs to be created within the body of the constructor
- We also include a default constructor to allow simple creation of a vector.
  - and as before, we use the this constructor language construct to leverage the single-argument constructor.
We (still) do not need an instance variable to hold the capacity of each Vector object, as we can obtain that information from the .length field of the arr instance variable
And again, the issue of whether capacity should be visible to the user raises its head. Note we provided a method getCapacity
- The main reason for this method at this point is to allow the user of tha class (e.g., me, when writing the CodeLab tests) to obtain the capacityfor test and debugging purposes

The app:

public class VectorApp {
	public static void main(String [] args) {
		doIt(new Vector(5));
		System.out.println();
		doIt(new Vector());
	}

	static void doIt(Vector v) {
		System.out.println("capacity: " + v.getCapacity());
		System.out.println("size: " + v.size());

		for (int i = 0; i < v.getCapacity(); i++) {
			v.add(i * 10);
			System.out.println(v + " (" + v.size() + ")");
		}

		for (int i = 0; i < v.size(); i++)
			v.set(i, v.get(i)+1);
		System.out.println("v: " + v);
	}

Notes

We've extracted the actual manipulation of the vector from main and placed it into a separate method
- This allows us to use that logic on multiple Vector objects, since we can now create vectors of different capacities.

And the output:

capacity: 5
size: 0
{0} (1)
{0, 10} (2)
{0, 10, 20} (3)
{0, 10, 20, 30} (4)
{0, 10, 20, 30, 40} (5)
v: {1, 11, 21, 31, 41}

capacity: 10
size: 0
{0} (1)
{0, 10} (2)
{0, 10, 20} (3)
{0, 10, 20, 30} (4)
{0, 10, 20, 30, 40} (5)
{0, 10, 20, 30, 40, 50} (6)
{0, 10, 20, 30, 40, 50, 60} (7)
{0, 10, 20, 30, 40, 50, 60, 70} (8)
{0, 10, 20, 30, 40, 50, 60, 70, 80} (9)
{0, 10, 20, 30, 40, 50, 60, 70, 80, 90} (10)
v: {1, 11, 21, 31, 41, 51, 61, 71, 81, 91}

CISC 3130 Data Structures Overview of Data Structures

Topics

Data Structures

Is an int or double a Data Structure?

Is a Student, BankAccount, or Customer Class a Data Structure?

Containers

So an Array Must be a Container and a Data Structure

Arrays

Arrays in Java

A Class Version of Java's Array

A Demo App for the Array Class

A First Container: Vector (Partially Populated Array)

Physical and Logical Size (Capacity vs Size)

Vector Version 1: Static (Compile-time) Capacity

Vector Version 2: Dynamic (Runtime) Capacity

CISC 3130
Data Structures
Overview of Data Structures

Is an `int` or `double` a Data Structure?

Is a `Student`, `BankAccount`, or `Customer` Class a Data Structure?