CIS 3142 Pointers, Arrays, References

Pointers

Concepts

pointers
- & (address-of operator), * (dereferencing operator)
- pointer (scaled) arithmetic
- pointers and references
- arrays and pointers
new operator
dynamic memory allocation

Uses of Pointers

Pointers can be used to:

Modify a parameter (indirection): this is a carryover from how to have a writeback parameter in C; in C++ it can and should be handled by using a reference parameter
Avoid the (space and time) overhead of performing a copy: this can and should be handled by using a (const) reference parameter (rather than a by-value parameter).
Maintain a pointer to an arbitrary location in memory:
- there a C-derived idioms that use a pointer to move through an array (examples on the C-String page). While they're elegant and interesting, subscripting is the way to go for the most part.
- used in systems programming to move around aribtrary memory locations, beyond the scope of this course
Maintain the pointer returned — by new — to dynamically allocated storage: this is the only real reason to use pointers in C++
- and this corresponds to maintaining the reference returned by new to a newly created object in Java
  - except in C++ you have to deal with the * and -> syntactic 'vinegar'

Overview

If we view memory as an 'array' of bytes, the address of each byte may be thought of as that bytes index or subscript.
C++ provides a family of types, known a pointers that allow one to store and operate on these address
- A pointer is nothing more than a value corresponding to an address in memory.
- Each type has a pointer type associated with it; thus there are pointers to integers, pointers to doubles, pointers to strings, pointers to Employee objects, and even pointers to other pointers.
In line with the subscript analogy, a pointer is nothing more than an integer, except, as a separate type, the compiler enforces various restrictions on its use:
- Pointers of one type are isolated from pointers to other types
  - Thus one cannot assign a pointer to an integer to a pointer to a string
  - Allowing 'cross-type' pointer assignment would result in garbage being accessed:
```
int i = 12;
int *ip = &i;
float *fp = ip;	// illegal — fp would appear to be pointing at an 8-byte float, but is really pointing to a 4-byte integer 
cout << *fp;;		// and print it out produces garbage; perfoming computations with its values is even worse - GIGO
						
```
- While some arithmetic operations are permitted on pointers, they are limited in nature (see below)
As we will see, pointers provide a level of indirection — i.e., access to the actual data is not done directly but rather through the (intermediate) pointer.
- This is the essential value of working with pointers; all techniques that involve pointers are based on this concept.

Declaring Pointers

Pointers are declared using a *

int *ip;		// pointer to int
double *dp;	// pointer to double
string *sp;	// pointer to string

This follows the general C++/C rule that items are declared using syntax to the way they are used (see below). Other examples of this principle are:
- Declaring arrays with [] (since the primary manipulation of an array is subscripting, i.e., the [] operator
- Function declarations using () (e.g., void f(int);) since the primary use of functions are to call them (again using ()'s around the arguments).
Pointers are strictly typed; i.e., when declaring a pointer, one declares the type of data is being pointed at: pointer to int, pointer to string, pointer to Donor, pointer to pointer to int.
- Pointers can only be assigned to other pointers of the same type.

Fundamental Pointer-Related Operations in C++

The dereferencing (*) operator
- Given the pointer, int *ip, the expression ip represents the value contained within ip's storage location
  - The type of this expression is a pointer to int (i.e., *int.
- The expression *ip (in which the * is the operator, and ip is the operand) represents the value obtained by retrieving the data in the location pointed at by ip
  - The type of *ip is int
  - Applying the * operator, strips off a pointer from the resulting type
The address-of (&) Operator
- Typically, your pointer variables will be given their initial value as a result of dynamically allocating memory; however, there are occasions when you wish to assign an address of an existing variable to a pointer.
  - As discussed above, this should only really happen if you use a pointer to move through an array — and in that case, you still don't need the & operator; if the pointer is ip and the array a, you would write ip = a
- The & operator returns the address of a storage location; the operands is most typically either the name of a variable (e.g., &total) or an element of an array (e.g., &a[i]).
  - Given the variable int total (whose type is , of course int), the type of the expression &total is pointer to int (i.e., *int).
  - Applying the & operators adds a level of pointer to the resulting type.
The -> Operator
- given a pointer to a class object — say Employee *ep — to access a member function — say f — (or data member) of the object pointer at by ep, one would have to write
```
(*ep).f();
				
```
  the paren's are a consequence of the precedence of the * and . operators.
- to avoid this error-prone and ugly construct, C++ provides the -> operator
```
ep->f()
				
```
  that has the identical semantics (meaning)
Note that * and & are inverses of each other and cancel each other out: *&x is the same as x.

Uses of Pointers

In C++, the main use of a pointer is to provide a mechanism to support dynamic memory allocation.
- At run-time, we will request an allocation of memory and receive a pointer to said memory in response.
Another use of a pointer is to provide a level of indirection to value
- For example having ip point at (i.e., contain the address of) an integer variable will allow us to access that variable through ip. Changing the value of ip to point at another integer variable permits us to manipulate THAT variable using the same pointer. This provides great flexibility, as we will see when we discuss dynamic memory allocation.

Pointer Arithmetic

Overview

Given the pointer declaration

int *ip

What would the (sensible) meaning of ip + 1 be?
- incrementing by 1 byte?
- incrementing by 4 (the size of an integer) bytes?
When would ip + 1 make sense? ip -1?

The Details — Scaled Pointer Arithmetic

Given the pointer T *p, p is incremented by the specified increment multiplied (scaled) by the size of the type (T) being pointed to.

char *cp: cp + 1 -> add 1 byte to cp
int *ip: ip + 1 -> add 4 bytes to ip
double *dp:dp + 1 -> add 8 bytes to dp

Valid Arithmetic Operations on Pointers

The legal pointer arithmetic operations are those that 'make sense':

pointer + integer — moves pointer upwards by (scaled) value of integer
- ip + 1, ip += 1, ip++
pointer - integer — moves pointer downwards by (scaled) value of integer
- ip - 1, ip -= 1, ip--
(in)equality of pointers — two pointers (not) pointing to the same location
- ip1 == ip2
- ip1 != ip2
pointer - pointer — determines (scaled) distance between pointers
- ip1 - ip2
- When would this be meaningful?
pointer comparison — <, <=, etc
- ip1 < ip2 — ip1 has a lower address in memory than ip2
- Again, when is this meaningful?

Arrays

Declaration and Initialization

Declared with a syntax similar to Java, except they are array objects, not merely reference variables; i.e., space for the array is alocated
```
int arr[10];
		
```
- Declared as a local variable, they are allocated on the stack, like any other variable
Arrays must be declared with a size
```
int nums[10];
double averages[14];
string names[100]; 
		
```
- Note the placement of the subscript (after the variable name)
As with Java, they are 0-based
An array may be initializes with an initalization list:
```
			int a[5] = {1, 2, 3, 4, 5};
			int b[] = {1, 2, 3};
		
```
- If the size is omitted, the number of elements in the initialization list is used to determine the size of the array
- there are specific rules for what happens is the explicit size does not match the number of elements in the intialization list

The Subscript Operator — The Array as Pointer

Given the array declaration:

int a[10];

the identifier a (i.e., the name of the array) is viewed as a pointer to the first element of the array
the notation a[i] refers to the elements i elements from the beginning of the array.
a[i] ≡ *(a + i)
- i.e., subscripting in C++ is defined via the corresponding pointer expression; subscript is sometimes called a 'secondary notation'

Passing an Array to a Function

The contiguous area of the array is not what is passed, rather its address.
- This makes sense, the array is not the full extant of storage; but a pointer to the first element
- This has several major ramifications
  - while the address of the array is usually passed by-value (and thus won't change in the function, the content are being 'passed' by indirection and thus, changes made to the elements of the array persist upon leaving the function
  - In the function header, the array is declared without a size (the elements are not passed, merely the starting address)
```
bool contains(int arr[], int n, int val);
						
```
    - This mean the function can handle any size array (of the proper type, of course)
  - Array parameters can also be declared as pointers (which is what they are)
```
bool contains(int *arr, int n, int val);
						
```
Unless the array contains some sort of trailer/sentinel value, the number of elements in the array is passed as a separate argument
```
bool contains(int arr[], int n, int val);
		
```
- Compelling reason to introduce an vector/array class.

Within the function, the array is processed as usual (usually in conjunction with the associated length parameter):

bool contains(int arr[], int n, int val) {
	for (int i = 0; i < n; i++)
		if (arr[i] == val) return true;
	return false;
}

A Closer Look at Pointers and Arrays

Given the declarations
```
		int a[10], *ip;
		
```
what does ip = &a[0] mean?
- ip is now pointing to the first element of a
What about
```
ip = &a[0];
for (i = 0; i < n; i++) {
	cout << *ip;
	ip++;
}
		
```
- We position ip at the beginning of the array; the loop then prints the contents of the location pointed at by ip, and then moves ip to the next element. This is done n times (once for each element of the array)
Finally, how about
```
for (i = 0; i < n; i++)
	cout << *(a+i);
		
```
- Same result — each time through the loop the contents of the location i elements away from the beginning of the array is printed

Pointers vs. Arrays

Declaring an array, however, is not the same as declaring a pointer to the corresponding element type
```
int *ip;
		
```
- Only a pointer (typically 4 or 8 bytes) is allocated)
- Not pointing at anything yet
- Can assign to ip
```
int a[10]
		
```
- Space is allocated for 10 integers
  - No space is allocated for the pointer — it's a constant (almost a liter) and the compiler keeps track of the address of the beginning of the array and associates it with a
- a a points to the first element
- cannot assign to a (no Lvalue).
  - a is not a pointer variable but rather a fixed name for the location of the first element
  - When you code the following:
```
int a[] = {1, 2, 3, 4};
						
```
    you are providing initial values to the elements of the array; you are not assigning an array to a

A Final Look at Pointers and Arrays

Given:

int *ip, a[10]

all of the following are legal:

a[5]
a[20]
a[-1]
5[a]
ip[5]
ip+1
a+4
a-1
*(a+2)
*(ip+1)

Recursion and Arrays

Given

int a[10]

i.e., a is an array of 10 elements, what is a+1?

The recursive definition if an array (an array is either empty , or an element followed by an array) is exposed and accessible for use in C++
```
void print(int a[], int n) {
	if (n == 0) return;
	cout << a[0] << ' ';
	print(a+1, n-1);
		
```
- Note no need to copy the remaining elements to a new array, or use subarrays

The `sizeof` Operator

The sizeof operator is a compile-time value; i.e., the compiler replaces the sizeof operation with the number of bytes allocated to the operand.
```
cout << sizeof(int);	
		
```
The operand to sizeof can be a type, variable, or expression.
- There are rules as to when the operands needs to be enclosed in ()'s. In general, simply use the parens.
sizeof was more hevily used in C, where dynamic allocation of memory was a function whose argument was the number of bytes to allocate
- In C++, new takes care of that calculation, using the type that is specified
sizeof for an array returns the number of bytes allocated to the array
- This only works within the function/block the array was declared in
- If the array is passed as an argument, the corresponding parameter is not an array (it's a pointer), and using it as the operand to sizeof produces the number of bytes a pointer takes up on the machine.

Dynamic Memory

Overview

Variable names are static in the sense that they must be chosen during program editing
- New variable names cannot be introduced after editing, for example, at run-time
One of the uses of pointers is that they allow us to keep track of memory that we will request at runtime
- The result of our request will be the address of the new memory, which we can then store in a pointer.
With this ability, we can allocate an arbitrary amount of memory (up to the limit of the machine's memory size) during execution, allowing for much more flexibility than previously.
- We are also not restricted to the fixed (i.e., static) number of variable names specified in the program source. As long as we can find somewhere to save (and subsequently retrieve) all the pointers to the dynamically allocated locations, we can have as many as we want (where we store those pointers and how we organize and manipulate them is a major portion of the topics covered in a data structures course).

The `new` Operator

C++ provides the programmer with the ability to allocate additional memory on an as-need basis during runtime.
Such memory is known as dynamically allocated memory
- In some languages, such memory is called programmer-controlled memory as its allocation/ deallocation is completely under programmer control.
Additional memory can be obtained during execution using the new operator, which accepts a type (and the number of elements if an array is being allocated), and returns a pointer to the newly allocated storage:
```
int *ip;
ip = new int;		// Allocates space for a new integer and returns the address
double *dp = new double[100]; 	// Allocates space for 100 doubles and returns the address of the first
		
```
- Note how C++ allows scalars to be created via dynamic memory (unlike Java (where primitives are never allocated using new).

The `delete` Operator

Memory allocated using the new operator is completely under the programmer's control,; and thus it is up to the programmer to decide when the memory should be freed.
this is achieved using the delete operator which accepts a pointer to the memory to be freed:
```
delete ip;
delete [] dp;
		
```
- Note: if an array was allocated (using the [] notation as seen above for dp, a corresponding notation should be used when using the delete operator on that memory.

Dynamically-Allocated Arrays

Arrays whose size is unknown until runtime can be created using dynamic memory
Rather than using the familiar array notation, a pointer must be used
```
int *arr;
		
```
The pointer is then assigned the result returned from a call to the new operator:
```
int n;
cin >> n;

arr = new int[n];
		
```
once allocated, use of the array is exactly as if it were declared as usual
- This is one more consequence of arrays being treated as pointers

Aliases

The indirection provided by pointers introduces a new issue — that of the alias

Just as the term alias in everyday English means a alternate name for something, in C++, two pointers are aliases if they contain the same address (i.e., if they point to the same location).
Changes through either pointer affects the value of the location (regardles of which pointer is used to retrieve it).
Aliases can also occur with a varu=iable and a pointer assigned the addres of that variable:
```
			int x = 12;
			int *ip = &x;		// *ip is now an alias of x
		
```
Aliases can prove problematic in that changes through one affect the other (thus a change in one part of the program can affectt another part without its knowledge).

Memory Leaks (Garbage)

In the code that resized the array, had we not deleted the old storage, we could very well have run out of memory eventually. This is because we reassigned the arr pointer to point to a different area of storage; leaving us with no way to get to the old area. This was not a problem in itself, because we had copied the information out of it, but it meant that we now had memory that we could no longer use, nor get to. This situation is called a memory leak and the memory itself (which is now useless since it can't be reached) is called garbage.

Preventing the occurrence of garbage is one of the primary purposes of having a destructor in a class.
Assigning a pointer away from a storage location does not automatically create garbage as long as there is an alias (i.e., another pointer) to the location.

Dangling References/Pointers

The other side of the coin from garbage are dangling references; this is when storage IS freed up, but the programmer still has a pointer to the freed up storage and uses it as if the storage was still available.

int *ip = new int[10];
...
delete [] ip;
...
ip[3] = 1;	// uh, oh... ip's storage was freed; we have no business using ip

Following a dangling reference is exteremely dangerous, the memory may have been reallocated and reassigned new values. In the above example, assigning 1 to ip[3] may very well be corrupting data being used in another portion of the program (that obtained that storage in a subsequent call to new).
What we need is either:
- to be very careful not to access storage through a pointer whose storage has been deleted (and has not been assigned new storage after that), or
- have some value that can be assigned to the pointer AFTER the storage has been deleted and which indicates the pointer is currently unused (see next section).

`nullptr`, `NULL`

The predefined value NULL (obtained via a #include of cstdlib) is a value that can be assigned to any pointer and represents an invalid pointer.

This value is assigned to a pointer to represent that the pointer is current not pointing at anything
The program can test the value of a pointer for NULL to determine whether to 'follow' the pointer.
NULL is actually defined as being 0; there is a difference of opinion whether to use NULL or 0.
C++ 11 introduced the nullptr keyword to be used in place of NULL

References

A reference is an alias to an object.
References are declared using &, and are initialized by supplying the original name for the object.
```
int i = 12;
int &ir = i;
		
```
A reference is permanent — once assigned a reference cannot be reassigned.
Access to the value is performed transparently
```
ir++;		// Adds 1 to i
		
```
- In essence, reference is a pointer without all the notation and machinery
- This transparency means that the reference loses its individual identity (i.e., there's no way to manipulate the reference itself after it's created; all 'references' to the reference refer to what it is 'pointing' to.
Primary uses of a reference are to:
- permit call- and return-by-reference (see below)
- permit clean operator overloading

A Pattern for a Growable Array

The trick to implementing a growable array is to use a pointer rather than an array (which must be delcared with a size fixed at compile time).
- Using a pointer allows you to call new during runtime, when additional memory is needed.
- Furthemore, the level of indirection provided by the pointer allows access to the memeory though the pointer. I.e., even if the pointer is re-assigned to refer to a larger chunk of mememory, you are still going through the same pointer.
The basic idea is to maintain two separate size variables, and a pointer to the current allocation representing the array:
- A physical size, usually called the capacity. This corresponds to the number of elements allocated.
- a logical size, usually simply called the size representing the number of elements actually assigned values by the calling application.
  - The size is always less than or equal to the capacity. A size of 0 means the array is effectively empty; if the size equals the capacity, the array is full.
Inserting elements into the array incresaes the size; when the size equals the capacity, the array must be resized. This involves:
- Increasing the capacity by some value (say, 2 times the current capacity)
- Allocating new storage whose size is that of the new capacity
- Copying the elements from the old storage to the new storage
- Freeing the old storage
- Assigning the pointer to the new storage to the array's pointer

Here is a function that ensures there is room for at least one more element in the array.

// checkCapacity accepts an array pointer, and a size and capacity; if necessary, it resizes the array 

void checkCapacity(int *&arr, int size, int &capacity) {
	if (size == capacity) {		// full, need to grow/reallocate
		capacity *= 2;
		int *tmp = new int[capacity];
		for (int i = 0; i < size; i++)
			tmp[i] = arr[i];
		delete [] arr;
		arr = tmp;
	}
}

Notice the type of the arr parm — it's declared as a reference since the pointer may change (if the array needs to be resized).
- This is an example of a pointer being passed because it's a pointer (and not to simulate by-reference by passing 'by-pointer').

Here is the more typical scenario … the function appears as a member function of a class that performs automatic capacity mamnagement (typically a vector).

class vector {
public:
	vector(int capacity=100) : capacity(capacity), size(0), arr(new int[capacity]) {}
	…
private:
	void checkCapacity() {
		if (size == capacity) {		// full, need to grow/reallocate
			capacity *= 2;
			int *tmp = new int[capacity];
			for (int i = 0; i < size; i++)
				tmp[i] = arr[i];
			delete [] arr;
			arr = tmp;
		}
	}
	
	int *arr;
	int size, capacity;
};

Notice the paramaters of the previous version are no longer present; they are now encapsulated as data members of the class

CISC 3142
Programming Paradigms in C++
Lecture 5
Arrays, Pointers, Dynamic Memory Allocation, References

Pointers

Concepts

Uses of Pointers

Overview

Declaring Pointers

Fundamental Pointer-Related Operations in C++

Uses of Pointers

Pointer Arithmetic

Overview

The Details — Scaled Pointer Arithmetic

Valid Arithmetic Operations on Pointers

Arrays

Declaration and Initialization

The Subscript Operator — The Array as Pointer

Passing an Array to a Function

A Closer Look at Pointers and Arrays

Pointers vs. Arrays

A Final Look at Pointers and Arrays

Recursion and Arrays

The `sizeof` Operator

Dynamic Memory

Overview

The `new` Operator

The `delete` Operator

Dynamically-Allocated Arrays

Aliases

Memory Leaks (Garbage)

Dangling References/Pointers

`nullptr`, `NULL`

References

A Pattern for a Growable Array

Code Used in this Lecture

CISC 3142 Programming Paradigms in C++ Lecture 5 Arrays, Pointers, Dynamic Memory Allocation, References

Pointers

Concepts

Uses of Pointers

Overview

Declaring Pointers

Fundamental Pointer-Related Operations in C++

Uses of Pointers

Pointer Arithmetic

Overview

The Details — Scaled Pointer Arithmetic

Valid Arithmetic Operations on Pointers

Arrays

Declaration and Initialization

The Subscript Operator — The Array as Pointer

Passing an Array to a Function

A Closer Look at Pointers and Arrays

Pointers vs. Arrays

A Final Look at Pointers and Arrays

Recursion and Arrays

The sizeof Operator

Dynamic Memory

Overview

The new Operator

The delete Operator

Dynamically-Allocated Arrays

Aliases

Memory Leaks (Garbage)

Dangling References/Pointers

nullptr, NULL

References

A Pattern for a Growable Array

Code Used in this Lecture

CISC 3142
Programming Paradigms in C++
Lecture 5
Arrays, Pointers, Dynamic Memory Allocation, References

The `sizeof` Operator

The `new` Operator

The `delete` Operator

`nullptr`, `NULL`