CIS 3130 Sorting

CIS 3130
Data Structures
Sorting

Overview

Why sort?
Three basic families of sorts
- Selection
- Insertion
- Exchange

Some Concepts

Array is sorted if a[i] <= a[i+1] for all i
Sorted vs unsorted portion of array
Arrays of size 0 or 1 are by definition sorted
Reducing a problem by 1 produces a sequence of 1...n which is quadratic cost
Dividing a problem in half produces a log n cost
Best and worst cases are fairly straightforward, average case is usually harder (involves probability analysis)
Usually talk about cost in terms of comparsions, can also address number of swaps and amount of additional memory required

We'll assume he availability of the following swap method:

static E void swap(E [] arr, int i, int j) {
	E t = arr[i];
	arr[i] = arr[j];
	arr[j] = t;
}

Bubble Sort

Start with unsorted array
Go through array exchanging adjacent neighbors out of sequence, a[i] > a[i+1]>
After first pass, largest element has bubbled to end of array
Repeat process, sorted array (at end) grows by 1, unsorted array shrinks by 1
Done when unsorted portion is of size 1 (no longer unsorted)

Cost?

First time through, n-1 comparisons
Each successive time requires 1 less comparison
- the latest 'largest' element has bubbled its way to its final position at the beginning of the sorted portion
n-1, n-2, .... 2, 1 results in n²
No distinct best or worst cases
# swaps?

sort(a[0 ... n-1]) {
	for (last = n-1 ... 1)
		for (i = 0 ... last-1)
			if (a[i] > a[i+1]) swap them
}

An optimization

Each time through check if any elements were exchanged, if not, done!

Cost?

Best case-- already sorted, no exchanges on first pass, results in n-1 comparisons
- Why is this not so unlikely?
Worst case, must continue to last pass -- reverse sorted -- n²

sort(a[0 ... n-1]) {
	for (last = n-1 ... 1 && swapped an element last time through)
		for (i = 0 ... last-1)
			if (a[i] > a[i+1]) swap them
}

Insertion sort

Start with sorted array of size 1 at beginning of array
Insert second element of array into proper position in sorted array
- Hold onto the second element, a hole is now formed
- Slide larger elements toward end of array.
- Drop second element into hold positioned at proper location
Continue inserting elements until unsorted portion is of size 1

Cost?

Best case -- already sorted, each insertion requires 1 comparison, results in n-1
Worst cast -- reverse sorted, first insertion requires 1 comparison, then 2, etc, results in 1...n-1, or quadratic
What about swaps?

sort(a[0 ... n-1]) {
	for (next = 1 ... n-1) {
		hold onto a[next]
		for (i = next-1 ... 0 && a[i] > held item)
			slide a[i] one position to right
		place held item into a[i+1]
	}
}

Selection Sort

Go through array picking up largest elements, swap with last element
- Optimization — only swap if last element is not already largest
  - good if array is already sorted
Repeat until only one element in unsorted array-- sorted array grows from rear

Cost?

Best case is same as worst case -- quadratic
# swaps?

sort(a[0 ... n-1])
{
	for (last = n-1 ... 1)
		find largest element in a[0 ... last]
		swap with a[last]
}

Quicksort

Choose arbitrary element from array (we'll take the first)
Partition array around this chosen element so that when done
- Chosen element is in its 'final sorted' position
- All elements less then the chosen chosen element to its left
- All elements greater then the chosen chosen element to its right
Note the result is NOT sorted; rather we say its partitioned
Do this recursively for both sides-- done when subarray size is 0 or 1

sort(a[lb ... ub]) {
	if (lb >= ub) done
	pos = position where placed chosen element in a[lb ... ub]
	sort(a[lb ... pos-1]);
	sort(a[pos+1 ... ub]);
}

Cost?

Best case -- partition element always lands in middle
- Array size is divided in half -- gives us log
- All the subarrays add up to about n comparisons
- Results in n logn
Worst case -- partition element lands at either end
- Array size is reduced by 1 element
- end up with n-1, n-2, ...1 comparisons-- quadratic!
Average turns out to be n log n
Clever techniques are used to prevent or minimize worst case
- e.g., use median of first, middle, last element as partition element

Heapsort

Preliminaries

'heap' - tree with property that element at every node is greater than either child
largest element in heap is at root

The basic idea:

Create heap
Remove root
'reheapify'
Repeat

Making it efficient

Complete (or nearly complete) tree can be nicely represented using an array
- node at index i has left child at 2*i+1 and right child at 2i+2
  - 0 → 1, 2
  - 1 → 3, 4
  - 2 → 5, 6
- node at index i has (unique) parent at (i-1)/2
  - 1 → 0, 2 → 0
  - 3 → 1, 4 → 1
  - 5 → 2, 6 → 2
- since i uniquely maps to 2i and 2i+1, we have a representation of a left and right child
- Since 2i and2i+1 uniquely map back to i, we have a parent representation as well

The actual algorithm:

Two phases
- Initialization: creating the initial heap
- Selecting largest element and reheapifying (restoring the heap property)

sort(a[0 ... n-1) {
	for (i = n/2-1 ... 0)
		adjust a[i ... n-1] so it becomes a heap

	for (last = n-1 ... 1)
		swap a[0] a[last]);
		adjust a[0 ... last-1] so it becomes a heap
}

Cost?

Turns out to always be n log n
Overview
- Tree represented by the array is (nearly) complete
- Longest path in complete tree of n nodes is log n
  - Adding a level doubles the nodes in a tree
  - Going up a level halves the number of nodes
  - Given n nodes can halve log n times
- Given a tree that is a heap with the possible exception of the root, heapifying requires log n comparisons
  - Start at root, compare, possibly swap and move down
Initializing
- Start at end move towards beginning and reheapify each time
- Nead to heapify n-1 times
- heap is never larger than n, thus log n to heapify
- Result is n log n
Removing largest and reheapifying
- n-1 elements selected,
- heap is no larger than n, thus log n reheapify
- Results in n log n

CIS 3130 Data Structures Sorting

Overview

Some Concepts

Bubble Sort

An optimization

Insertion sort

Selection Sort

Quicksort

Heapsort

Code for Sorting

CIS 3130
Data Structures
Sorting