CISC 3142
Programming Paradigms in C++
The C++ Canonical Class Form
Prologue - Understanding the Rules About Default Constructors, Copy Constructors, Assignment Operators, and
Miranda Functions
To the beginner, C++'s rules on the above functions seems impossibly complex.
However, nothing is arbitrary (or at least not deliberately so), and understanding the rationale makes
it easier to remember and understand the need for what otherwise appears to be unnecessarily complex.
We will start wit something that actually has nothing to do with the topic at hand, but it a good way of
introducing how the compile provides certain functions for free.
The Miranda Rule in U.S. Law
You have the right to remain silent when questioned.
Anything you say or do may be used against you in a court of law. (Modern readings have can and will in place of may)
You have the right to consult an attorney before speaking to the police and to have an attorney present
during questioning now or in the future.
If you cannot afford an attorney, one will be appointed for you before any questioning, if you wish.
If you decide to answer any questions now, without an attorney present, you will still have the right to stop answering at
any time until you talk to an attorney.
Knowing and understanding your rights as I have explained them to you, are you willing to answer my questions without an attorney present?
The operative portion for us is that something (i.e., an attorney) will be supplied for us if we are unable to supply one for ourselves
Default Constructors
If no constructor is supplied, a default constructor is supplied (by the compiler).
This constructor does essentially nothing; it contains no member initialization list and an empty body. For the class C.
C() {}
Declaring an object of the class in this context (i.e., C c) is essentially the class/object equivalent
of writing:
int i;
Just as in the above case, the programmer elected not to provide initialization to i,
similarly if the programmer decides not to provide initialization (in the form of a constructor)
for the objects of the class, the compiler will allow such objects to be created.
What the values of the data members are is the same as for the above integer initialization case
(i.e., it depends on WHERE the declaration occurs).
Once a constructor is explicitly defined in the class (by the class designer), the compiler-supplied constructor disappears.
This makes sense — once the class designer has decided how an object of the class should be initialized, we no longer have
the right to initialize it in any other fashion.
As an example, for a Name class, the class designer code a constructor that requires 2 string
object (e.g. first and last names).
This is a statement on the part of the designer that names MUST be created/initialized with two strings
Users of the class should not be able to create a Name in any other fashion (in particular specifying
nothing).
If the class designer wants a default constructor it;'s up to her to now supply one.
The rationale for the compiler-supplied constructor is twofold:
It allows someone who understands structs (essentially classes without member functions) to define a struct and create objects
of that type.
Without the compiler-supplied constructor, one would get a compiler error that the object can't be created
Most 1110's finish the semester with precisely this kind of type (behaviorless classes)
C++ was meant to be a compatible as possible with pre-existing C code, and since C only allowed structure and not classes,
all that C code (which used structs heavily) was in the same situation as the previous point (i.e., no functions in the
struct and in particular no constructor).
Of course we can only have a compiler-supplied constructor if it makes sense to define such a constructor for an arbitrary class.
Fortunately we can — the empty body — i.e., the compiler does nothing; not actually initializing anything is
no better or worse than what happens with the above int for example.
And this point is, to some extent, the reason for talking about compiler-supplied default constructors — the same
reasoning applies to copy constructors and assignment operators as we will see
The moral of the story is the compiler will supply functions to the class if it is possible to do so if and if doing so proves fairly useful
to the programmer.
The Copy Constructor
Recall that you don't explicitly or directly call a constructor, the call in inserted by the compiler when an object of the class is
created
The primary task of the constructor is thus the initialization of the object
The copy constructor is so named, because it is used when an object of a class is created and initialized using an object
of the same class
i.e., a copy is made of the latter into the new object.
There are three situations in which an initialization occurs via a copy:
A declaration with an initialization using an object of the same class:
C c1;
...
C c2 = c1; // same as C c2(c1);
Call-by-value
void f(C c) {....} // NOT by reference
...
C c1;
f(c1);
Return-by-value
C f() {
C c1;
...
return c1;
}
In all three of the above cases, a new object (the variable being declared, the parameter, the return value)
is being created and then initialized by an object of the same class (the rhs of the initialization, the argument to the
function, the local variable — c1 in all of the above cases)
There are two reasons we might have the compiler supply a copy constructor for us (in the absence of us taking matters into our own hands):
All of the above situations are fundamental to even the simplest of programs; writing a copy constructor on the other hand
is a bit beyond simple, so if we can have the compiler do it, it would make introductory level coding simpler.
As with the compiler-supplied constructor the above only makes sense if it's possible to define a copy for
objects of arbitrary classes, which leads to the next reason we have compiler-supplied copy constructors.
The compiler can easily implement the notion of a copy from one object to another of the same class:
Simply copy the value of each data member from the source object to the destination object.
For example:
class C {
C(int i, double d) : i(i), d(d) {}
private:
int i;
double d;
};
C c1(1, 2.5);
...
C c2 = c1; // copies c1.i to c2.i and c1.d to c2.d
The compiler-supplied copy constructor case is a bit different than the compiler-supplied default constructor
For the default constructor, once any other constructor has been defined, the user of the class should not have access to
a default constructor unless explicitly allowed (i.e., implemented) by the class designer.
For the copy constructor however, it is such a basic operation, and so uniformly defined, that it remains available unless
explcitly prohibited by the class designer (how?) or replaced by the class designer (how?).
The Assignment (=) Operator
Consider the following two ='s:
int x = 5;
and
x = 5;
The second occurrence is an assignment operator. The first is simple an = symbol introducing an initializer.
To drive home the difference that initialization and assignment are not the same, consider the following:
const double PI = 3.14;
and
const double PI;
PI = 3.14;
Moving to objects, we thus have two totally different constructs — and initialization and an assignment:
C c1 = c2;
and
C c3, c4;
...
c4 = c3;
In the first case
a new object is being created and thus a copy constructor is called
since this is a new object, no consideration need be given to the object before the constructor was called
constructors are not called by the programmer, therefore they have no return type or return value
In the second case:
A new object is not being created (c4 has been previous created (and presumably initialized using a constructor)
However, we still need to copy the value of the rhs to the lhs object
Thus we shouldn't be surprised if at least some of the copy constructor and assignment operator code
is the same)
We also may have to take into consideration the fact that c4 contained valid data beforehand
This may no seem obvious at this point-- after all, were going to overwrite it with the values from c3
Finally, the assignment operator is an operator; it appears in an expression and there has a return type and value.
The assignment of one object to another of the same class is as fundamental (if not more so) as initialization and thus our argument about
wanting a compiler-suppled copy constructor holds equally for a compielr-supplied assignment operator.
and indeed the same semantics of memberwise assignment apply.
The Dynamically Allocated Array
Here is an Array class whise underlying ('primitive') array is dynamically allocated:
class Array {
public:
Array(int capacity) : arr(new int[capacity]), capacity(capacity), size(0) {}
private:
int *arr;
int capacity, size;
};
Dynamic (i.e., heap) allocation of the underlying array container allows us to:
defer the capacity of the array until runtime (which is typically when we can determine the capacity)
resize the underlying container as needed
this is the basis for a vector (a resizeable array container), which in turn is the basis for the majority of
modern containers.
Applying what we just presented, what happens in the following cases?
Array a1(20);
…
Array a2 = a1;
void f(Array a); // note the call by value
…
…
Array a;
…
f(a);
Array f(); // note the return by value
…
…
Array a;
…
a = f();
Array a1(10), a2(20);
…
a1 = a2;
Why a Canonical Form?
Canonical: "conforming to well-established rules or patterns"
Having a canonical form provides a concrete set of guidelines for creating a
robust, fully-featured, properly functioning class
It's not that there won't be any errors, but many of the well-known pitfalls
are addressed by the canonical form.
If one or more of the class' data members are pointers (i.e., are responsible for the allocation/deallocation of dynamic
memory), then the canonical form is essentially mandatory to maintain proper copy semantics
The C++ Canonical Class Form
Define a default constructor for the class (recommended)
Arrays of objects of the class cannot be created easily if there is no default constructor
Define a destructor (required if pointer data members are present)
Define a copy constructor (required if pointer data members are present)
Define an assignment (=) operator (required if pointer data members are present)
Define an equality (==) operator (strongly recommended)
Define a stream insertion (<< operator (strongly recommended)
Define a stream extraction (>>) operator (optional)
the need for copy constructor, destructor, and assignment operator
an expandable array
A Word About Shallow vs Deep Copy
Given a pair of pointers, copying the pointer value from one to the other is known as a shallow copy; copying the
value being pointed at is call a deep copy.
shallow copies are also sometimes called address copies since it is the address of the data (rather than the
data itself) which is being copied.
shallow copies are typically faster than deep copies, since the address alone is copied, as opposed to the typically larger data item
(for example a video stream)
however, shallow copies create aliases (multiple pointers pointing at the same object — modifications made to the
data through any of them affects all) together with all the associated issues.
When a class contains a data member that is a pointer, there is the issue of what happens when a copy of the obkect needs to be made.
More often than not, a deep copy needs to be performed.
The default (Miranda) copy constructor and assinment operator perform memberwwise copying, which means the pointer (not the data) is
copied, i.e. a shallow copy — just the opposite of what is usually needed.
We must therefore supply our own copy constructor which performs a (semantic) deep copy.
This will usually involve our dynamically allocating space for the copy
Once the deep copy is made a new issue arises; part of the storage object has been allocated by the compiler (for example when you declared the object)
and part of it by us (dynamically alloating the space for the data being pointed to). The compiler will take care of its allocation; but we are responsible for ours
This leads to the need for a destructor
The (Orthodox) Canonical Class Form
From the above discussion, we see that a class with pointer members must typically include a copy constructor, overloaded assignment operator, and a destructor.
Such a class is said to conform to the orthodox canonical class form.
The above three are required for the class to function correctly.
In addition, there are other functions that are useful and desireable from the perspective of a user of the class:
An insertion (<<) operator for outputting the object
Less frequently, an insertion (>>) opeerator for inputting objects of the class
An equality (==) operator
The Need for a Programmer-Defined Copy Constructor
Provides logic for managing the logical portion of the object that is not part of the physical portion
Typically arises in the context of pointer data members and dynamically allocated memory
Here are two typical scenarios in which a copy constructor in invoked, and which requires a programmer-defined copy constructor for proper semantics:
Without a programmer-defined copy constructor, the default (compiler-supplied) copy constructor is used, and a member-wise
copy is performed, resulting in the creation of an alias to the underlying array
Note that the other data members are separate copies of the original; this is not a full alias of the vector
object (which is what you would have in Java)
Adding an element through v2 would affect the underlying array as well as v2's
size data member, but v1's size would remain untouched, effectively resulting
in a corrupt object through v1's perspective
A proper (programmer-defined) copy constructor would perform a deep copy of the object; i.e., allocate an independent copy of the underlying array for the newly
created vector object.
The Typical Structure of a Programmer-Defined Copy Constructor
C(const C &c) {
// Code to copy the value of the parameter (right hand operand) to the receiver (left hand operand)
…
}
Note the reference parameter; a call-by value would generate an 'infinite' recursive runaway
The copy code consists of whatever it takes to make an independent copy of the parameter (the right-hand operand)
This might include memory allocations and copying, logging, whatever ould not be done by a default
copy constructtor; if it was nothing more than a mere member-wise copy, it should be left to a default copy constructor
The Need for a Programmer-Defined Assignment Operator
Again, provides logic for managing the logical portion of the object that is not part of the physical portion
And again, typically arises in the context of pointer data members and dynamically allocated memory
Two major differences from the copy constructor
In an assignment, the object already exists, and will typically have it's own allocation of dynamic memory; this
needs to be deleted (recycled) to prevent a memory leak. (In contrast, the copy constructor is operating
on a freshly created object that has no such memory allocated to it yet.)
Unlike the copy constructor, the assignment operator has a return value — typically a reference to the assigned
object (the left-hand operand).
Here is a typical scenario in which a programmer-defined assignment operator is needed:
As before, without a programmer-defined assignment operator, the default (compiler-supplied) assignment operator is used, and a member-wise
copy is again performed, this time resulting not only in the creation of an alias to the underlying array, but also resulting in a memory
leak from the original underlying array of the left hand operand not being deleted
A proper (programmer-defined) assignment operator would perform a deep copy of the object; i.e., allocate an independent copy of the underlying array for the newly
created vector object, as well as delete the old underlying array of the left hand operand
The Typical Structure of a Programmer-Defined Assignment Operator
C &C(const C &c) {
if (&c == this) return *this; // avoid self-assignment
// Code to perform a semantically-proper copy of the parameter's value to the 'receiver'
// as well as cleanup of the old value of the receiver (left-hand operand)
…
return *this;
}
The copy code will typically be quite similar to that of the copy constructor
The logic is often leveraged by placing it in a workhorse function which is then
called by the constructor and assignment operator
the cleanup is unique to the assignment operator as it is working with an already-existing
object that may have prior memory allocation
similarly, unlike the copy constructor, the assignment operator has a return value —
conventionally a reference to the (newly assigned) left hand operand
finally, the first line of code avoids the overhead (and possibly erroneous semantics) of a self-copy;
note the comparison of pointers (this and &c) — this corresponds to the
semantics of Java's == on objects.
The Need for a Destructor
The destructor is the other side of the coin — typically managing the deallocation of the (dynamically allocated portion of the logical object
that is not part of the physical object proper
And again, arises in the context of pointer data members and dynamically allocated memory
Here are the scenarios in which a destructor is needed:
void f() {
vector v1;
...
} // exiting function; v1 (automatically) going out of scope
vector *vp = new vector; // dynamically allocated vector object
...
delete vp; // vector object destroyed (deallocated) by programmer
In both scenarios the object is at the end of its lifetime: for the local variable (v1), the function is being exited and all
locals are popped off the call stack; for the second case, the (anonymous) dynamically-allocated vector object pointed to
by vp is being deallocated (using delete).
The compiler only knows about the physical portion of the vector object; i.e., the declared data members, and is
therefore the only thing freed up (i.e., popped off the stack for the local case, and returned to the heap for the pointer case).
Without a destructor, the remaining portion of the (logical) vector object, i.e., the underlying array, remains
allocated, resulting in a memory leak
Providing a destructor allows the programmer (i.e., the class designer) the opportunity to deallocate any remaining memory belonging to the logical portion
that are not part of the physical portion of the object.
The Typical Structure of Programmer-Defined Destructor
~C() {
// Code to perform any logical cleanup the object that couldn't be done by the compile
}
The typical code for a destructor is the deallocation of the memory that is part of the logical object, but not part of the physical object (i.e., the data members)
and impossible for the compiler to clean up.$
An Array Class in Canonical Form
The Array class is highly instructive:
It's a fairly minimal class illustrating the need for the canonical form
It's the baby brother of the STL vector class (which is essentially the same as Java's ArrayList class).
An expandable (growable) array object is one of the fundamental classes used to implement all the classic data structures
array.h
Notice the signature of the copy constructor
Array(const Array &other)
It's vital that the parameter (the Array object being copied from) be passed by reference, otherwise
it would be passed by value — which involves calling the copy constructor — and we would be trapped in
an infinite recursion.
The destructor's signature symbol is the ~ which is one of the not operators in C++ (the other being !)
The previous integer class had no call for a subscript ([]) operator; on the other hand it's exactly what we want in
an array class
[] is nothing more than the operator version of our earlier combined get function of Simple2;
i.e., the semantics are the same, it's the syntax that's different
Note the overloaded const and non-const versions
I wanted to demonstrate the growability of the Array object, so I've introduced a += operator that allows a user to
append values to the end of the array — the underlying (built-in C++) array being expanded as necessary. (In languages like Java, which do not
have operator overloading, this function is typically called called add or append, or (as we'll see in STL) push_back).
The function checkCapacity will ensure that there is at least one empty element (at the end of the array), allowing for the valid appending
of an element. This function will be called by the operator
Since by no one else is calling it &mdash and in particular, no one from outside the class — we make it private.
Note the underlying, built-in array (arr) is declared as a pointer in order to allow us to dynamically allocate the elements.
We often say that the underlying, built-in, C++ array (arr is the backing or underlying store for
the Array object.
There are two variables values now associated with the size of the Array
a capacity representing the physical size of maximum number of elements in the allocated underlying array
a size representing the logical size or number of elements actually appended to the Array object.
array.cpp
The copy constructor and assignment operator.
These two functions have much in common. They both:
copy the scalar members, capacity and size from the source object (other to the
destination object (this)
allocate space for the copy of the source object's arr
copy elements from the source arr to the new space
The number of elements to be copied is the size value (which is now the same in both source and destination)
assign the pointer to the new space to the arr pointer of the destination (this) object.
The copy constructor and assignment operator differ in that:
The copy constructor returns no value while the assignment operator should return a reference to the newly assigned (i.e., lshs) object.
The destination object of the copy constructor is newly created and thus its arr member is not pointing at anything; the assignment operator's
destination object (the lhs operand) has space allocated to it and pointed to by it's arr member.
The assignment operator therefore must thus also:
delete the existing space pointed to by the destination (lhs) arr member
return a reference to the receiver object; i.e., *this
The assignment operator must also take into consideration the possibility of self-assignment in order to make sure
the object doesn't delete its arr allocation. It does this with the initial check if (this == &rhs) return *this;.
Because they share much in common, a third workhorse function is written that performs the common tasks and is called by both the
copy constructor as well as the assignment operator. This function is usually made private.
The subscript ([]) operator
Since we have control over this operation, we now decide to perform bounds checking.
The vector class of the STL does not do bounds checking in this context for performance reasons.
We handle bound errors by throwing an ArrayException. Exception classes are often used, rather than throwing
string's, int's or the like.
The checkCapacity function ensure there's at least one element empty at the end of the array, and if not performs a reallocation.
Notice that checkCapacity also shares some code with the copy constructor and assignment operator,a and thus could theoretically benefit
from a workhorse function.
I've also illustrated a standard development debugging technique of using a bool constant — I called it DEBUG
whose value can be easily turned on or off, and which controls the printing of various diagnostics useful during development and debugging.
This is a very simple implementation of such a technique, but it's more than I see most students coming into the upper courses with.