CIS 3142 — The Canonical Class Form

CISC 3142
Programming Paradigms in C++
The C++ Canonical Class Form

Prologue - Understanding the Rules About Default Constructors, Copy Constructors, Assignment Operators, and Miranda Functions

To the beginner, C++'s rules on the above functions seems impossibly complex.
However, nothing is arbitrary (or at least not deliberately so), and understanding the rationale makes it easier to remember and understand the need for what otherwise appears to be unnecessarily complex.
We will start wit something that actually has nothing to do with the topic at hand, but it a good way of introducing how the compile provides certain functions for free.

The Miranda Rule in U.S. Law

You have the right to remain silent when questioned.
Anything you say or do may be used against you in a court of law. (Modern readings have can and will in place of may)
You have the right to consult an attorney before speaking to the police and to have an attorney present during questioning now or in the future.
If you cannot afford an attorney, one will be appointed for you before any questioning, if you wish.
If you decide to answer any questions now, without an attorney present, you will still have the right to stop answering at any time until you talk to an attorney.
Knowing and understanding your rights as I have explained them to you, are you willing to answer my questions without an attorney present?

The operative portion for us is that something (i.e., an attorney) will be supplied for us if we are unable to supply one for ourselves

Default Constructors

If no constructor is supplied, a default constructor is supplied (by the compiler).
- This constructor does essentially nothing; it contains no member initialization list and an empty body. For the class C.
```
C() {}
				
```
  - Declaring an object of the class in this context (i.e., C c) is essentially the class/object equivalent of writing:
```
int i;
						
```
  - Just as in the above case, the programmer elected not to provide initialization to i, similarly if the programmer decides not to provide initialization (in the form of a constructor) for the objects of the class, the compiler will allow such objects to be created.
    - What the values of the data members are is the same as for the above integer initialization case (i.e., it depends on WHERE the declaration occurs).
Once a constructor is explicitly defined in the class (by the class designer), the compiler-supplied constructor disappears.
- This makes sense — once the class designer has decided how an object of the class should be initialized, we no longer have the right to initialize it in any other fashion.
  - As an example, for a Name class, the class designer code a constructor that requires 2 string object (e.g. first and last names).
  - This is a statement on the part of the designer that names MUST be created/initialized with two strings
  - Users of the class should not be able to create a Name in any other fashion (in particular specifying nothing).
- If the class designer wants a default constructor it;'s up to her to now supply one.
The rationale for the compiler-supplied constructor is twofold:
- It allows someone who understands structs (essentially classes without member functions) to define a struct and create objects of that type.
  - Without the compiler-supplied constructor, one would get a compiler error that the object can't be created
  - Most 1110's finish the semester with precisely this kind of type (behaviorless classes)
- C++ was meant to be a compatible as possible with pre-existing C code, and since C only allowed structure and not classes, all that C code (which used structs heavily) was in the same situation as the previous point (i.e., no functions in the struct and in particular no constructor).
Of course we can only have a compiler-supplied constructor if it makes sense to define such a constructor for an arbitrary class.
- Fortunately we can — the empty body — i.e., the compiler does nothing; not actually initializing anything is no better or worse than what happens with the above int for example.
- And this point is, to some extent, the reason for talking about compiler-supplied default constructors — the same reasoning applies to copy constructors and assignment operators as we will see

The moral of the story is the compiler will supply functions to the class if it is possible to do so if and if doing so proves fairly useful to the programmer.

The Copy Constructor

Recall that you don't explicitly or directly call a constructor, the call in inserted by the compiler when an object of the class is created
- The primary task of the constructor is thus the initialization of the object
The copy constructor is so named, because it is used when an object of a class is created and initialized using an object of the same class
- i.e., a copy is made of the latter into the new object.
There are three situations in which an initialization occurs via a copy:
- A declaration with an initialization using an object of the same class:
```
C c1;
...
C c2 = c1; 		// same as C c2(c1);
				
```
- Call-by-value
```
void f(C c) {....}		// NOT by reference
...
C c1;
f(c1);
				
```
- Return-by-value C f() { C c1; ... return c1; }
In all three of the above cases, a new object (the variable being declared, the parameter, the return value) is being created and then initialized by an object of the same class (the rhs of the initialization, the argument to the function, the local variable — c1 in all of the above cases)
There are two reasons we might have the compiler supply a copy constructor for us (in the absence of us taking matters into our own hands):
- All of the above situations are fundamental to even the simplest of programs; writing a copy constructor on the other hand is a bit beyond simple, so if we can have the compiler do it, it would make introductory level coding simpler.
  - As with the compiler-supplied constructor the above only makes sense if it's possible to define a copy for objects of arbitrary classes, which leads to the next reason we have compiler-supplied copy constructors.
- The compiler can easily implement the notion of a copy from one object to another of the same class:
  - Simply copy the value of each data member from the source object to the destination object.
    - For example:
```
class C {
	C(int i, double d) : i(i), d(d) {}
private:
	int i;
	double d;
};

C c1(1, 2.5);
...
C c2 = c1;	// copies c1.i to c2.i and c1.d to c2.d
								
```
The compiler-supplied copy constructor case is a bit different than the compiler-supplied default constructor
- For the default constructor, once any other constructor has been defined, the user of the class should not have access to a default constructor unless explicitly allowed (i.e., implemented) by the class designer.
- For the copy constructor however, it is such a basic operation, and so uniformly defined, that it remains available unless explcitly prohibited by the class designer (how?) or replaced by the class designer (how?).

The Assignment (`=`) Operator

Consider the following two ='s:
```
int x = 5;
		
```
and
```
x = 5;
		
```
- The second occurrence is an assignment operator. The first is simple an = symbol introducing an initializer.
- To drive home the difference that initialization and assignment are not the same, consider the following:
```
const double PI = 3.14;
				
```
  and
```
const double PI;
PI = 3.14;
				
```
Moving to objects, we thus have two totally different constructs — and initialization and an assignment:
```
C c1 = c2;
		
```
and
```
C c3, c4;
...
c4 = c3;
		
```
- In the first case
  - a new object is being created and thus a copy constructor is called
  - since this is a new object, no consideration need be given to the object before the constructor was called
  - constructors are not called by the programmer, therefore they have no return type or return value
- In the second case:
  - A new object is not being created (c4 has been previous created (and presumably initialized using a constructor)
  - However, we still need to copy the value of the rhs to the lhs object
    - Thus we shouldn't be surprised if at least some of the copy constructor and assignment operator code is the same)
  - We also may have to take into consideration the fact that c4 contained valid data beforehand
    - This may no seem obvious at this point-- after all, were going to overwrite it with the values from c3
  - Finally, the assignment operator is an operator; it appears in an expression and there has a return type and value.
The assignment of one object to another of the same class is as fundamental (if not more so) as initialization and thus our argument about wanting a compiler-suppled copy constructor holds equally for a compielr-supplied assignment operator.
- and indeed the same semantics of memberwise assignment apply.

The Dynamically Allocated `Array`

Here is an Array class whise underlying ('primitive') array is dynamically allocated:

class Array {
public:
	Array(int capacity) : arr(new int[capacity]), capacity(capacity), size(0) {}
private:
	int *arr;
	int capacity, size;
};

Dynamic (i.e., heap) allocation of the underlying array container allows us to:

defer the capacity of the array until runtime (which is typically when we can determine the capacity)
resize the underlying container as needed
- this is the basis for a vector (a resizeable array container), which in turn is the basis for the majority of modern containers.

Applying what we just presented, what happens in the following cases?

Array a1(20);
…
Array a2 = a1;

void f(Array a);	// note the call by value
…
…
Array a;
…
f(a);

Array f();	// note the return by value
…
…
Array a;
…
a = f();

Array a1(10), a2(20);
…
a1 = a2;

Why a Canonical Form?

Canonical: "conforming to well-established rules or patterns"
Having a canonical form provides a concrete set of guidelines for creating a robust, fully-featured, properly functioning class
- It's not that there won't be any errors, but many of the well-known pitfalls are addressed by the canonical form.
If one or more of the class' data members are pointers (i.e., are responsible for the allocation/deallocation of dynamic memory), then the canonical form is essentially mandatory to maintain proper copy semantics

The C++ Canonical Class Form

Define a default constructor for the class (recommended)
- Arrays of objects of the class cannot be created easily if there is no default constructor
Define a destructor (required if pointer data members are present)
Define a copy constructor (required if pointer data members are present)
Define an assignment (=) operator (required if pointer data members are present)
Define an equality (==) operator (strongly recommended)
Define a stream insertion (<< operator (strongly recommended)
Define a stream extraction (>>) operator (optional)

Canonical Form for a Class

file adt.h

#ifndef ADT_H
#define ATD_H

#include <iostream>

class ADT {
	friend std::ostream &operator <<(std::ostream &, const ADT &);
	friend std::istream &operator >>(std::istream &, ADT &);		
public:
	ADT();
	ADT(const ADT &);
	~ADT();
	ADT &operator =(const ADT &)
	bool operator ==(const ADT &);
	...
private:
	...	
};
#endif

file adt.cpp

#include <iostream>

#include "adt.h"

using namespace std;

ostream &operator <<(ostream &os, const ADT &adt) {
	...
	return os;
}

istream &operator >>(istream &is, ADT &adt) {
	...
	return is;
}
		
ADT::ADT() {...}

ADT::ADT(const ADT &source) {...}

ADT::~ADT() {...}

ADT &ADT::operator =(const ADT &rhs) {
	if (this == &rhs) return *this;
	...
	return *this;
}

bool ADT::operator ==(const ADT &rhs) {...}

#endif

Here is some more guidance on implementing the canonical form for a class as well as a class template

An Array Class

Illustrates:

dynamic memory allocation and pointers
the need for copy constructor, destructor, and assignment operator
an expandable array

A Word About Shallow vs Deep Copy

Given a pair of pointers, copying the pointer value from one to the other is known as a shallow copy; copying the value being pointed at is call a deep copy.

int *p1, *p2;
...
p2 = p1;     // shallow
*p2 = *p1;   // deep

VideoStream *vs1, *vs2;
vs1 = loadVideo("Lincoln");
vs2 = loadVideo("Avatar");
vs2 = vs1;    // shallow
*vs2 = *vs1;  // deep

shallow copies are also sometimes called address copies since it is the address of the data (rather than the data itself) which is being copied.
shallow copies are typically faster than deep copies, since the address alone is copied, as opposed to the typically larger data item (for example a video stream)
however, shallow copies create aliases (multiple pointers pointing at the same object — modifications made to the data through any of them affects all) together with all the associated issues.

Here is a more in-depth look at the issues of aliasing and copying in Java and C++

Classes with Pointer Members

When a class contains a data member that is a pointer, there is the issue of what happens when a copy of the obkect needs to be made.

More often than not, a deep copy needs to be performed.
The default (Miranda) copy constructor and assinment operator perform memberwwise copying, which means the pointer (not the data) is copied, i.e. a shallow copy — just the opposite of what is usually needed.
We must therefore supply our own copy constructor which performs a (semantic) deep copy.
- This will usually involve our dynamically allocating space for the copy
Once the deep copy is made a new issue arises; part of the storage object has been allocated by the compiler (for example when you declared the object) and part of it by us (dynamically alloating the space for the data being pointed to). The compiler will take care of its allocation; but we are responsible for ours
- This leads to the need for a destructor

The (Orthodox) Canonical Class Form

From the above discussion, we see that a class with pointer members must typically include a copy constructor, overloaded assignment operator, and a destructor. Such a class is said to conform to the orthodox canonical class form.

The above three are required for the class to function correctly.
In addition, there are other functions that are useful and desireable from the perspective of a user of the class:
- An insertion (<<) operator for outputting the object
- Less frequently, an insertion (>>) opeerator for inputting objects of the class
- An equality (==) operator

The Need for a Programmer-Defined Copy Constructor

Provides logic for managing the logical portion of the object that is not part of the physical portion
- Typically arises in the context of pointer data members and dynamically allocated memory

Here are two typical scenarios in which a copy constructor in invoked, and which requires a programmer-defined copy constructor for proper semantics:

vector v1;
…
vector v2 = v1;   // Copy constructor

void f(vector v2);  // Note call-by-value 

vector v1;
...
f(v1);

Without a programmer-defined copy constructor, the default (compiler-supplied) copy constructor is used, and a member-wise copy is performed, resulting in the creation of an alias to the underlying array
- Note that the other data members are separate copies of the original; this is not a full alias of the vector object (which is what you would have in Java)
- Adding an element through v2 would affect the underlying array as well as v2's size data member, but v1's size would remain untouched, effectively resulting in a corrupt object through v1's perspective
A proper (programmer-defined) copy constructor would perform a deep copy of the object; i.e., allocate an independent copy of the underlying array for the newly created vector object.

The Typical Structure of a Programmer-Defined Copy Constructor

C(const C &c) {
	// Code to copy the value of the parameter (right hand operand) to the receiver (left hand operand)
	…
}

Note the reference parameter; a call-by value would generate an 'infinite' recursive runaway
The copy code consists of whatever it takes to make an independent copy of the parameter (the right-hand operand)
- This might include memory allocations and copying, logging, whatever ould not be done by a default copy constructtor; if it was nothing more than a mere member-wise copy, it should be left to a default copy constructor

The Need for a Programmer-Defined Assignment Operator

Again, provides logic for managing the logical portion of the object that is not part of the physical portion
- And again, typically arises in the context of pointer data members and dynamically allocated memory
Two major differences from the copy constructor
- In an assignment, the object already exists, and will typically have it's own allocation of dynamic memory; this needs to be deleted (recycled) to prevent a memory leak. (In contrast, the copy constructor is operating on a freshly created object that has no such memory allocated to it yet.)
- Unlike the copy constructor, the assignment operator has a return value — typically a reference to the assigned object (the left-hand operand).

Here is a typical scenario in which a programmer-defined assignment operator is needed:

vector v1;
...
vector v2;
...
v2 = v1;   // assignment

As before, without a programmer-defined assignment operator, the default (compiler-supplied) assignment operator is used, and a member-wise copy is again performed, this time resulting not only in the creation of an alias to the underlying array, but also resulting in a memory leak from the original underlying array of the left hand operand not being deleted
A proper (programmer-defined) assignment operator would perform a deep copy of the object; i.e., allocate an independent copy of the underlying array for the newly created vector object, as well as delete the old underlying array of the left hand operand

The Typical Structure of a Programmer-Defined Assignment Operator

C &C(const C &c) {
	if (&c == this) return *this;		// avoid self-assignment
	// Code to perform a semantically-proper copy of the parameter's value to the 'receiver'
	// 	as well as cleanup of the old value of the receiver (left-hand operand)
	…
	return *this;
}

The copy code will typically be quite similar to that of the copy constructor
- The logic is often leveraged by placing it in a workhorse function which is then called by the constructor and assignment operator
- the cleanup is unique to the assignment operator as it is working with an already-existing object that may have prior memory allocation
- similarly, unlike the copy constructor, the assignment operator has a return value — conventionally a reference to the (newly assigned) left hand operand
- finally, the first line of code avoids the overhead (and possibly erroneous semantics) of a self-copy; note the comparison of pointers (this and &c) — this corresponds to the semantics of Java's == on objects.

The Need for a Destructor

The destructor is the other side of the coin — typically managing the deallocation of the (dynamically allocated portion of the logical object that is not part of the physical object proper
- And again, arises in the context of pointer data members and dynamically allocated memory

Here are the scenarios in which a destructor is needed:

void f() {
	vector v1;
	...
}     // exiting function; v1 (automatically) going out of scope

vector *vp = new vector;		// dynamically allocated vector object
...
delete vp;				// vector object destroyed (deallocated) by programmer

In both scenarios the object is at the end of its lifetime: for the local variable (v1), the function is being exited and all locals are popped off the call stack; for the second case, the (anonymous) dynamically-allocated vector object pointed to by vp is being deallocated (using delete).
- The compiler only knows about the physical portion of the vector object; i.e., the declared data members, and is therefore the only thing freed up (i.e., popped off the stack for the local case, and returned to the heap for the pointer case).
- Without a destructor, the remaining portion of the (logical) vector object, i.e., the underlying array, remains allocated, resulting in a memory leak
Providing a destructor allows the programmer (i.e., the class designer) the opportunity to deallocate any remaining memory belonging to the logical portion that are not part of the physical portion of the object.

The Typical Structure of Programmer-Defined Destructor

~C() {
	// Code to perform any logical cleanup the object that couldn't be done by the compile
}

The typical code for a destructor is the deallocation of the memory that is part of the logical object, but not part of the physical object (i.e., the data members) and impossible for the compiler to clean up.$

An Array Class in Canonical Form

The Array class is highly instructive:

It's a fairly minimal class illustrating the need for the canonical form
It's the baby brother of the STL vector class (which is essentially the same as Java's ArrayList class).
An expandable (growable) array object is one of the fundamental classes used to implement all the classic data structures

`array.h`

Notice the signature of the copy constructor
```
Array(const Array &other)
		
```
It's vital that the parameter (the Array object being copied from) be passed by reference, otherwise it would be passed by value — which involves calling the copy constructor — and we would be trapped in an infinite recursion.
The destructor's signature symbol is the ~ which is one of the not operators in C++ (the other being !)
The previous integer class had no call for a subscript ([]) operator; on the other hand it's exactly what we want in an array class
- [] is nothing more than the operator version of our earlier combined get function of Simple2; i.e., the semantics are the same, it's the syntax that's different
- Note the overloaded const and non-const versions
I wanted to demonstrate the growability of the Array object, so I've introduced a += operator that allows a user to append values to the end of the array — the underlying (built-in C++) array being expanded as necessary. (In languages like Java, which do not have operator overloading, this function is typically called called add or append, or (as we'll see in STL) push_back).
The function checkCapacity will ensure that there is at least one empty element (at the end of the array), allowing for the valid appending of an element. This function will be called by the
operator
- Since by no one else is calling it &mdash and in particular, no one from outside the class — we make it private.
Note the underlying, built-in array (arr) is declared as a pointer in order to allow us to dynamically allocate the elements.
We often say that the underlying, built-in, C++ array (arr is the backing or underlying store for the Array object.
There are two variables values now associated with the size of the Array
- a capacity representing the physical size of maximum number of elements in the allocated underlying array
- a size representing the logical size or number of elements actually appended to the Array object.

`array.cpp`

The copy constructor and assignment operator.
- These two functions have much in common. They both:
  - copy the scalar members, capacity and size from the source object (other to the destination object (this)
  - allocate space for the copy of the source object's arr
  - copy elements from the source arr to the new space
    - The number of elements to be copied is the size value (which is now the same in both source and destination)
  - assign the pointer to the new space to the arr pointer of the destination (this) object.
- The copy constructor and assignment operator differ in that:
  - The copy constructor returns no value while the assignment operator should return a reference to the newly assigned (i.e., lshs) object.
  - The destination object of the copy constructor is newly created and thus its arr member is not pointing at anything; the assignment operator's destination object (the lhs operand) has space allocated to it and pointed to by it's arr member.
  - The assignment operator therefore must thus also:
    - delete the existing space pointed to by the destination (lhs) arr member
    - return a reference to the receiver object; i.e., *this
  - The assignment operator must also take into consideration the possibility of self-assignment in order to make sure the object doesn't delete its arr allocation. It does this with the initial check if (this == &rhs) return *this;.
- Because they share much in common, a third workhorse function is written that performs the common tasks and is called by both the copy constructor as well as the assignment operator. This function is usually made private.
The subscript ([]) operator
- Since we have control over this operation, we now decide to perform bounds checking.
  - The vector class of the STL does not do bounds checking in this context for performance reasons.
- We handle bound errors by throwing an ArrayException. Exception classes are often used, rather than throwing string's, int's or the like.
The checkCapacity function ensure there's at least one element empty at the end of the array, and if not performs a reallocation.
- Notice that checkCapacity also shares some code with the copy constructor and assignment operator,a and thus could theoretically benefit from a workhorse function.
I've also illustrated a standard development debugging technique of using a bool constant — I called it DEBUG whose value can be easily turned on or off, and which controls the printing of various diagnostics useful during development and debugging.
- This is a very simple implementation of such a technique, but it's more than I see most students coming into the upper courses with.

CISC 3142 Programming Paradigms in C++ The C++ Canonical Class Form