Topics
- Data structures
- Containers
- Static and dynamic fixed capacity
- Arrays and vectors
Data Structures
From Wikipedia:
In computer science, a data structure is a data organization, management, and storage format that is usually chosen for
efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, and the functions
or operations that can be applied to the data.
Is an int
or double
a Data Structure?
- They're composed of smaller bits in a format that is chosen for efficient access and operation
- However, the actual access and operation is performed at the hardware level — as opposed to being designed/implemented by the
programmer — so we don't consider them data structures per se.
Is a Student
, BankAccount
, or Customer
Class a Data Structure?
- Technically yes, and definitely more so than the above numeric types.
- They are constructed using the class mechanism which allows for the 'packaging' of various pieces of data (attributes)
of a modelled object, and thus form a structure of data, corresponding to the information
maintained for a student, a bank account, and a customer respectively.
- As an aside, strictly speaking, the class is not a data structure, rather a specification for the creation of objects (instances)
that form the structure.
- We will often be pedantic in this manner; proper use of the terminology of classes and object is often important in our discussion
- However, these structures are 'boring' in the sense that their specification and methods are 'one-offs' in the sense that they are for
a particular application,rather than having universal application.
Containers
Again, from Wikipedia:
In computer science, a container is a class or a data structure whose instances are collections of other objects. In other words, they store objects in an organized way that follows specific access rules.
- Containers are more interesting in that they have a universal application: storing (and subsequently retrieving) other objects.
- Furthermore, efficient storage/retrieval (and iteration) are both key and lead to very interesting algorithms and structures
- They are thus worthy of in-depth study
Thus, a course in data structures is typically an investigation into containers.
In some sense, this course is your first in studying computer science rather than simply a programming language. We will be taking most of the
material you have covered in the previous courses and applying it to the study, design, and implementation of containers.
So an Array Must be a Container and a Data Structure
- Yes it is, and indeed, it is the fundamental homogeneous container/data structure found in hardware and
most programming languages
- Homogeneous means that all the elements of the array are of the same type
- This is in contract with a heterogeneous container in which the various elements can be of different types
(the class mechanism permits the creation of heterogeneous structures, e.g. a
Person
object with a string for a name and
an integer for an age).
- However, like the numeric examples at the beginning of this discussion, the operations upon the array (subscripting) are performed below the level of the programmer.
- Thus, while definitely in the category of data structure/container, arrays are of little interest to us except as a foundational container upon which
we will build other containers with various storage/retrieval properties.
- We will see that the combination of homogeneous (e.g. array) and heterogeneous (e.g. class definition) provides us with the basic mechanism
for creating arbitrary containers.
Arrays
Before we move on from the built-in array, let's examine it's basic properties, operations, and limitations.
- Element: an array, like all containers, holds other objects, known as the elements of the container
- Each element has a unique position, known as its index or subscript within the array; most modern
languages use
0
as the first position, and thus the last element is at index #elements - 1
- Homogeneous: the elements of an array are all of the same type; in Java the type is supplied at the time of declaration of the corresponding reference variable
as well as when the array object is created., e.g.
int [] arr = new int[100];
- Fixed size: the number of elements of an array is specified at object creation time, and cannot change (although the value can be calculated at run time), again
new int[100]
- however, the reference variable referring to the array can be made to refer to a different array object; this will be the basis of 'growable' containers, which
we introduce in the next lecture.
- Contiguous memory: the elements of the array are guaranteed to follow each other in memory, with element 0 first, followed by element 1, etc
- Subscripting: the basic operation of storage and retrieval is subscripting, i.e., the specification of the element to be stored to and retrieved from. In most
languages, this operator is written as
[i]
(upon occasion (i)
) with i
being the element in question.
- It is the contiguous memory property of the array that makes subscripting a highly efficient operation; accessing the millionth element of an array is
essentially as efficient as the first.
Specific to Java:
length
: the number of elements allocated to the array can be obtained via this integer value. Note this is not a method, but rather a read-only value field
associated with the array
- Bounds-checking: attempting to access an element whose index is outside the range of the fixed-size of the array (i.e.,
0 .. length-1
) causes an exception to
be thrown.
Arrays in Java
- The primitive array does not supply a
length
field, nor perform bounds-checking; these are both artifices provided by the Java runtime
- For example, C/C++ provides neither of those two facilities for their array type .
- In fact, an array in Java is not considered a primitive type (as is the case for C/C++), but is rather an object, which implies runtime support.
- In this sense, we could say Java's array type is more of a data structure (in the sense we are using the term) that a primitive array.
- But again, as this support is coming below our radar (i.e., provided by the runtime rather than our own code, we will treat the array
as a fundamental container
- As an aside, the properties of the array (in particular the mechanics of subscripting) are covered in more detail in an architecture course.
A Class Version of Java's Array
Here is the Java built-in array, implemented as a class. The idea here is:
- To present an example of a container class
- In this case we use a familiar container … the array
- To emphasize the absence of operators when dealing with classes
- Java does not support the definition of operators for classes, thus we have to replace the
subscript operator with
get
/set
methods
- To introduce the notion of capacity
- To show a
toString
method for a container
class Array {
Array(int capacity) {arr = new int[capacity];}
Array() {this(DEFAULT_CAPACITY);}
int get(int index) {return arr[index];}
void set(int index, int value) {arr[index] = value;}
int length() {return arr.length;}
public String toString() {
String result = "{";
for (int i = 0; i < length(); i++)
result += arr[i] + (i < length()-1 ? ", " : "");
result += "}";
return result;
}
int [] arr;
static final int DEFAULT_CAPACITY = 10;
}
- As mentioned above, this is simply an illustration of what a typical container class will look like
- Note the use of the built-in array to store the elements of this container.
- The use of an object as an instance variable of another class is known as composition and
we will have much more to say about this later.
- The container instance variable, i.e., the object that will hold the actual elements (in this case the built-in array)
is often called the underlying container
- Changes in behavior from the built-in array:
- Little is added in the way of functionality:
- Some loss of expressivity:
- subscripting is gone: no operator symbols for classes; replaced by
get
/set
methods
length
field replaced by length
method
- One can make an argument in either direction whether this is truly a data structure
- While it does package data within it, it's nothing more than a single item — the underlying array
- There's no additional state (variables) to provide additional logic or storage/retrieval semantics (as is the case i
- OTOH, there is the additional of the
toString
- Notice that
get
/set
, and length
all achieve their semantics by calling
corresponding operations at the (built-in) array level.
- This is known as delegation, and we will have a lot to say about it later as well.
- bounds checking still occurs (implicitly) at the array level
- Note the use of the
this
constructor mechanism in the default, overloaded constructor that allows us to take advantage of the logic of
the single-argument constructor
- Having one method exploit the logic of another method is a technique known as
leveraging
and is related to the notions of delegation mentioned above and
and wrapper classes mentioned below. We will discuss them in detail an upcoming lecture.
- The conditional expression
(i < length()-1 ? ", " : "")
in the String
method is a technique sometimes called comma-logic
that allows on to print a sequence of comma-separated valued, making sure no comma is output after the last value (or before the first)
- Finally, note the static default CAPACITY value (and its naming convention of all caps)
- Quick overview of — stylistic conventions for handling multi-word identifiers
- camelCase (lowerCamelCase): variables and methods in Java
- PascalCase (UpperCamelCase): classes and interfaces in Java
- snake_case: most identifiers in C/C++, file names in Linux
- UPPER_SNAKE_CASE: constants (
final
) in Java
- kebab-case: identifiers in HTML
- Overall, there is little advantage (aside from the availability of a
toString
) to using this class version as opposed to the built-in array object
- The overall effect is that we have encased the built-in array within a class. We sometimes call a class with this sort of functionality a
wrapper class.
- This is somewhat related to the notion of delegation above; and again, we will discuss this soon
- There is also a notion of a wrapper method which you may have run across in the context
of recursion and array processing.
A Demo App for the Array Class
Here is a Java application (i.e., a class with a
main
method that uses the
Array
y class defined above:
public class ArrayApp {
public static void main(String [] args) {
Array a = new Array();
System.out.println("length: " + a.length());
for (int i = 0; i < a.length(); i++)
a.set(i, i * 10);
System.out.println("a: " + a);
System.out.println("length: " + a.length());
for (int i = 0; i < a.length(); i++)
a.set(i, a.get(i)+1);
System.out.println("a: " + a);
}
}
- Again, this will be a template of sorts for writing applications that illustrate the use of a container
- You can add an out-of bounds reference (e.g.
a[-1]
or a[a.length]
to see the ArrayIndexOutOfBoundsException
.
- Again, this is generated by the Java runtime, not our code
- The above app simply illustrates (demos) the capabilities of our implementation
- Each method/constructor was called to show it in action
- In particular, there is no attempt to test the correctness of the container's logic nor to attempt to 'break it'
- We will address testing later on
Here is the output of the app:
length: 10
a: {0, 10, 20, 30, 40, 50, 60, 70, 80, 90}
length: 10
a: {1, 11, 21, 31, 41, 51, 61, 71, 81, 91}
A First Container: Vector (Partially Populated Array)
Arrays are typically populated by reading in (or calculating) data items and adding them to the end of the array.
- Unless one knows in advance the number of items, one cannot allocate an array of the exact size.
- Furthermore, even if one knows the exact number of initial items, it is often the case that
one might wish to subsequently add or remove items.
To solve this problem, we usually employ
partially-populated arrays rather than
an array of the exact number of elements (i.e., a
fully-populated array).
- A partially-populated array consists of an array together with an integer variable, often named
size
.
- An additional operation — let's call it
add
— append
adds an element to the next
available index of the array.
size
is used to maintain the location of the next available element
size
is also the number of elements placed into the array
- The only elements that are considered part of the container are those in the range
0 .. size-1
- The semantics of the container thus uses both the array and the size variable. Packaging the two together in a class and supplying methods
to achieve this semantics is what we mean by a data structure.
The above behavior is a subset of a more substantial container known as a
vector,
list, or
sequence (In Java,
the corresponding class is the
ArrayList
. We will be investigating it in more detail later.
Physical and Logical Size (Capacity vs Size)
- capacity/physical size: number of elements the array was created with
- size/logical size: number of elements assigned values
- We sometime refer to the physical container (the array as created) vs the logical container (the filled
portion of the array.
- The former has bounds
0
… arr.length-1
; whereas the latter has bounds 0
… size-1
- The only elements that contain meaningful data are those in the range
0 … size-1
.
- Thus, one builds up the (logical) portion of the container by appending values (using
add
), and then is free to get
and/or
set
anywhere in that range
Vector Version 1: Static (Compile-time) Capacity
We're actually going to take a step backwards here vis-a-vis the above
Array
class: we're going to
start of with a container whose capacity is hardcoded into the class (rather than specified in the constructor as we did with
Array
).
- We're doing this to allow us to initiate a discussion about capacity, i,.e, the allocated size /
number of elements of a container.
class Vector {
int get(int index) {return arr[index];}
void set(int index, int value) {arr[index] = value;}
void add(int value) {arr[size++] = value;}
int size() {return size;}
public String toString() {
String result = "{";
for (int i = 0; i < size; i++)
result += arr[i] + (i < size-1 ? ", " : "");
result += "}";
return result;
}
int [] arr = new int[CAPACITY];
int size = 0;
static final int CAPACITY = 10;
}
- Note the lack of any constructor; none is necessary since the array's capacity is fixed in the code
- By the same token, there is no need for a
capacity
instance variable (nor a DEFAULT_CAPACITY VALUE)
- The introduction of a
CAPACITY
static (class) variable is simply good programming practice (avoid magic numbers in the code proper)
- In general, I will be choosing relatively small capacities in the notes to allow for manageable output of the containers.
- The use of an increment (or equivalently a decrement) in the subscript in the
add
method is one of the few contexts in which this operator
might be used without overly being criticized (though there are those who would criticize this usage as well).
- Recall the semantics of the increment/decrement operators:
x++
first saves the value of x
as the value of the operation, and then performs the increment
++x
first performs the increment and then uses the incremented value as the result of the operation
- The way to remember this is to remember which appears first in the operation: if the variable appears first (
x++
) use it's value before the operation; if the operator is first,
apply it first and then use the value.
- Normally, use of increment/decrement in a larger expression is heavily frowned upon; however given that this is basically a standalone expression within the
subscript, many will consider its use in this acceptable
- I like it because it allows me to reduce the method to a single line of code which further allows the entire method and header to be placed on a single line.
- The standard Java conventions for containers is to use the name
size
for the method returning the number of elements (rather than length
). As it turns out,
Java permits the use of the same name for both an instance variable and instance method in the same class.
- Such usage is often frowned upon as well, but is accepted in the container classes because of legacy coniderations
- This implementation is actually incorrect, i.e., it allows errors that can easily corrupt the container.
- We will address and correct this in the next lecture
Here is the app:
public class VectorApp {
public static void main(String [] args) {
Vector v = new Vector();
System.out.println("size: " + v.size());
for (int i = 0; i < Vector.CAPACITY; i++) {
v.add(i * 10);
System.out.println(v + " (" + v.size() + ")");
}
for (int i = 0; i < v.size(); i++)
v.set(i, v.get(i)+1);
System.out.println("v: " + v);
}
}
- Not much different than
ArrayApp
except we get to make use for the add
method, which
handles the append for us (in ArrayApp
we needed to pass the proper index (in the form of i
).
- Note the access to the static CAPACITY variable in the app (in order to know how many elements we can add).
- We will soon discuss whether the capacity should be visible to the user
- Note that the app restricts itself to only
get
ting/set
ting those elements that were appended using add
. This is in
contrast to our Array
(or the built-in array) where we had no such constraint (although we did not arbitrarily scatter values through the array, we could have).
- The next lecture will discuss how to enforce staying within the logical bounds of the vector.
size: 0
{0} (1)
{0, 10} (2)
{0, 10, 20} (3)
{0, 10, 20, 30} (4)
{0, 10, 20, 30, 40} (5)
{0, 10, 20, 30, 40, 50} (6)
{0, 10, 20, 30, 40, 50, 60} (7)
{0, 10, 20, 30, 40, 50, 60, 70} (8)
{0, 10, 20, 30, 40, 50, 60, 70, 80} (9)
{0, 10, 20, 30, 40, 50, 60, 70, 80, 90} (10)
v: {1, 11, 21, 31, 41, 51, 61, 71, 81, 91}
- We output the vector as it grows together with the value of
size
- Our demo apps should 'show off' the capabilities of our container; there was no need to display a capacity (it was fixed) or size (there was none)
in the
ArrayApp
Vector Version 2: Dynamic (Runtime) Capacity
class Vector {
Vector(int capacity) {arr = new int[capacity];}
Vector() {this(DEFAULT_CAPACITY);}
int get(int index) {return arr[index];}
void set(int index, int value) {arr[index] = value;}
void add(int value) { arr[size++] = value;}
int getCapacity() {return arr.length;}
int size() {return size;}
public String toString() {
String result = "{";
for (int i = 0; i < size; i++)
result += arr[i] + (i < size-1 ? ", " : "");
result += "}";
return result;
}
int [] arr;
int size = 0;
static final int DEFAULT_CAPACITY = 10;
}
- We now need a constructor to allow us to specify the capacity of our vector.
- The array now needs to be created within the body of the constructor
- We also include a default constructor to allow simple creation of a vector.
- and as before, we use the
this
constructor language construct to
leverage the single-argument constructor.
- We (still) do not need an instance variable to hold the capacity of each
Vector
object, as we can obtain that information
from the .length
field of the arr
instance variable
- And again, the issue of whether capacity should be visible to the user raises its head. Note we
provided a method
getCapacity
- The main reason for this method at this point is to allow the user of tha class (e.g., me, when writing the CodeLab tests)
to obtain the capacityfor test and debugging purposes
The app:
public class VectorApp {
public static void main(String [] args) {
doIt(new Vector(5));
System.out.println();
doIt(new Vector());
}
static void doIt(Vector v) {
System.out.println("capacity: " + v.getCapacity());
System.out.println("size: " + v.size());
for (int i = 0; i < v.getCapacity(); i++) {
v.add(i * 10);
System.out.println(v + " (" + v.size() + ")");
}
for (int i = 0; i < v.size(); i++)
v.set(i, v.get(i)+1);
System.out.println("v: " + v);
}
- We've extracted the actual manipulation of the vector from
main
and placed it into a
separate method
- This allows us to use that logic on multiple
Vector
objects, since we can now create
vectors of different capacities.
And the output:
capacity: 5
size: 0
{0} (1)
{0, 10} (2)
{0, 10, 20} (3)
{0, 10, 20, 30} (4)
{0, 10, 20, 30, 40} (5)
v: {1, 11, 21, 31, 41}
capacity: 10
size: 0
{0} (1)
{0, 10} (2)
{0, 10, 20} (3)
{0, 10, 20, 30} (4)
{0, 10, 20, 30, 40} (5)
{0, 10, 20, 30, 40, 50} (6)
{0, 10, 20, 30, 40, 50, 60} (7)
{0, 10, 20, 30, 40, 50, 60, 70} (8)
{0, 10, 20, 30, 40, 50, 60, 70, 80} (9)
{0, 10, 20, 30, 40, 50, 60, 70, 80, 90} (10)
v: {1, 11, 21, 31, 41, 51, 61, 71, 81, 91}