Overview
This chapter presents pointers, arrays, and references. They appear in the same chapter because they are related:
- references are transparent abstractions of pointers — they represent a means of accessing an object via indirection
- arrays and pointers are closely related — the name of the array is treated as a pointer to the first element
Here is a write-up on pointers, arrays, references, and several related topics
Topics
Pointers
- Pointers are typed — on must specify the type of the object pointed to
- dereferencing / indirection
void *
- pointer to nothing/everything
- used to override pointer typing system; primarily for systems programming (accessing arbitrary addresses)
nullptr
- No address will be allocated at 0, so
0
is a good choice for a null pointer
- before
nullptr
, NULL
was defined as a macro for 0
#define NULL 0
nullptr
is an abstraction of 0
/ NULL
- part of the language (known to the compiler)
- single
nullptr
, rather than one for each pointer type
Arrays
- Must specify the type and number (capacity) of elements
T ta[capacity];
- Note the position of the
[]
's
- main operation is subscripting
[]
- Can be static, automatic (stack) or dynamic (heap/free store)
- static
int arr1[10]; // at global (file) scope — outside any function
…
int main() {…}
- automatic (stack)
void f() {
int arr2[100];
…
}
- dynamic (free store / heap)
void g() { // some function
…
int *ip = new int[n];
…
}
- arrays are low-level and should be encapsulated in richer, more robust containers (e.g., vector)
- no array-level operations (e.g. can't assign one array to another)
- array name is treated as a pointer to first element
- not sure if this is technically correct, but I like to think of the array name as a constant, non-lvalue (i.e., almost
like a literal) pointer to the first element
- this means arrays can't be passed by value, in the sense of making a copy of the entire array
- note in the above dynamic case (the one allocated using
new
) the array is the anonymous storage
allocated, and is distinct from the pointer (arr3
) to which its address is assigned (which is why the pointer
is named ip
and not arr3
)
- C-style string - 0-terminated array of chars
- the
'\0'
acts as a trailer value for the elements of the array, since '\0'
is not a valid textual character
Array Initializers
- Initialization within declaration similar to Java … list of values
- if size left out, compiler infers from number of elements in list
- if size supplied and too few elements,
0
is used for rest
- if size supplied, too many elements is an error
String Literals
- sequence of characters within double quotes
- C-style string so compiler adds
'\0'
to the end
char s[] = "Hello";
- The above is the same as
char s[] = {'H', 'e','l', 'l', 'o', '\0'};
- (Note the omission of the capacity in the []'s — the compiler determines the capacity from the initialization list)
the type of a string literal is an array of const character of capacity the number of characters + 1, i.e. const char[capacity]
- thus the type of
"Hello"
is const char[6];
Comparison is done via the strcmp
function (from the C-Standard cstring
library)
Pointers into Arrays
In a nutshell, once declared, an array (or at least its variable name) is basically a pointer to the first element.
- The subscript operation is actually defined in terms of pointers:
a[i]
is defined as *(a+i)
- go back to the link at the top and look at pointer arithmetic
Navigating Arrays
- Arrays can be processed using subscripting:
const int ARR_CAP = 10;
int a[ARR_CAP];
…
for (int i = 0; i < ARR_CAP; i++)
cout << a[i];
or pointers (several versions follow):
const int ARR_CAP = 10;
int a[ARR_CAP];
…
for (int *p = a; p < a+i; p++) // p != a+i also good
cout << *p;
const int ARR_CAP = 10;
int a[ARR_CAP];
…
p = a;
while (p != a+i) { // basically the above for loop as a while
cout << *p;
p++;
const int ARR_CAP = 10;
int a[ARR_CAP];
…
for (int i = 0; i < ARR_CAP; i++)
cout << *(a+i); // just the subscript expressed as pointer arithmetic
- In general, subscripting is used
- That's because most arrays are processed by their size, which is basically a header value; and header value
sequences are processed using counting (i.e., 'for') loops
- The most common exception are when processing C-strings, which use a trailing value ('\0'), and thus use a condition-based
(i.e., 'while') loop.
Passing Arrays
C-Style Strings (char *
)
C — which did not have classes, and the ability to have the resource-handle memory management available in C++ — represented strings as
arrays of characters. They are often referred to as char *
because of the close relationship between arrays and pointers.
Here is a write-up on C-Style Strings. They're ubiquitousness in C has been heavily reduced in C++ because of the
higher-level string
class that handles its own memory management and provides a much more intuitive interface via operator
overloading. However, there are still several reasons for being at least acquainted with C-Style strings and their processing idioms:
- command-line arguments come into the program as parameters to
main
, and for the sake of backwards compatibility to C, they have
retained their semantics in C++. As such these arguments are C-Style strings. (One can avoid most of the char *
semantics
by casting the arguments to string
— as we did with string-literal exceptions — but one must still read other's
code, and not everyone will avoid working with the char *
(and some of that code might derive from C making the char *
unavoidable).
// arg_lister.cpp
int main(int argc, char *argv[]) {
for (int i = 0; i < argc; i++)
cout << "argv[" << i << "]: " << argv[i] << endl;
}
weiss> arg_lister Hello world !
argv[0]: Hello
argv[1]: world
argv[0]: !
- as mentioned above, and before, string literals are of type
const char *
, again for C compatibility. Occasionally you may run into issues
— most typically compiler errors — related to that type, and understanding the error usually requires taking the array/pointer character
of the literal into account.
- For example, we've seen that the constructor (and
open
function) of fstream
objects accepts a
char *
argument rather than the expected string
.
- from the point of view of being a 'native', there are several programming idioms that crop up when using C-Style strings, all related
to working with pointers. These idioms — besides being somewhat unique to C/C++ (because of the low-level nature of pointers
in these languages) — are also quite elegant and 'beautiful, and one cannot really call oneself a C++ programmer — and definitely
not a C programmer — without at least understanding these idioms. (The above write-up presents these idioms.)
Pointers and Ownership
The crucial issue is … once an object has been (dynamically) allocated on the free store (heap), who is responsible for that memory,
i.e., who owns it?
- The naive answer if the pointer (or containing logic) that allocated it; but pointers are often 'passed around'
- Stroustrup refers to the object as a resource (there's a finite supply) which must be carefully maintained
- The suggestion is to always maintain a 'master' pointer in a class which can control the allocation (typically in the constructor) and
deallocation (in the destructor) of the resource
- A class that does this is called a resource handle class
- Other pointers to the object should not delete the object
- As an example, take
vector
:
References
Unlike pointers, references do not maintain their own identity, nor do they require special machinery (such as *
or ->
to access what they refer to (their alias).
- the syntax for working with a reference (other than its declaration) is the same as for the object it refers to (is an alias of)
- references can't be 'reseated', i.e., after assigned their initial alias they can no longer be ;moved' to refer to another object
- there is no 'null' value for an alias
Advice
- [1] Keep use of pointers simple and straightforward; §7.4.1.
- pointer use should be restricted for the most part to holding addresses (handles) of dynamically allocated memory
- no need to use them to simulate call-by-reference (because we now have actual call-by-reference in C++)
- rare need to use them for systems-programming
- [2] Avoid nontrivial pointer arithmetic; §7.4.
- most processing involving such arithmetic can be done via subscripting
- [3] Take care not to write beyond the bounds of an array; §7.4.1.
- contrast with Java bounds-checking
- [4] Avoid multidimensional arrays; define suitable containers instead; §7.4.2.
- if we reduce use of arrays (advice 7.6), this becomes a total non-issue
- even if we stick with arrays, most multi-dimensional arrays (like most nested loops)
can be decomposed into layers of responsibility
- A two-dimensional array consisting of the 5 exams for 20 students in a class is better
represented as an array of 20 student objects, each containing an array of 5 exams
- [5] Use
nullptr
rather than 0 or NULL; §7.2.2.
nullptr
is an entity known to the compiler, which can then take language semantics into consideration
while 0
and NULL
don't
- [6] Use containers (e.g.,
vector
, array
, and valarray
) rather than built-in (C-style) arrays; §7.4.1.
- They take care of their own memory management, provide a richer set of operations than arrays, and often
maintain the integrity of the structure through invariants
- [7] Use string rather than zero-terminated arrays of char; §7.4.
- string is the container equivalent of zero-terminated arrays (char *)
- [8] Use raw strings for string literals with complicated uses of backslash; §7.3.2.1.
- [9] Prefer const reference arguments to plain reference arguments; §7.7.3.
- If something can be declared const as a parameter, it means it value is not being changed in the function. Enforce that
with const.
- [10] Use rvalue references (only) for forwarding and move semantics; §7.7.2.
- [11] Keep pointers that represent ownership inside handle classes; §7.6.
- The constructor and destructor of the handle class then becomes responsible for the pointers lifetime and management
- [12] Avoid void * except in low-level code; §7.2.1.
- void * is a 'typeless' pointer that was used in C to bypass the restriction on pointers in C.
- This was occasionally necessary in system-programming
- It also provided an 'escape hatch' for memory allocation functions
- C++ eliminating this need with the
new
operator (which is known to the compiler)
- It has little to no use in C++ — except for the referred-to low-level programming.
- [13] Use const pointers and const references to express immutability in interfaces; §7.5.
- Including const in a parameter type declaration or return value conveys important and useful information about the
function
- [14] Prefer references to pointers as arguments, except where 'no object' is a reasonable option; §7.7.4.
- References are simpler to work with, and involve less machinery on the part of the programmer