CIS 3130
Data Structures
Sorting
Overview
- Why sort?
- Three basic families of sorts
- Selection
- Insertion
- Exchange
Some Concepts
- Array is sorted if a[i] <= a[i+1] for all i
- Sorted vs unsorted portion of array
- Arrays of size 0 or 1 are by definition sorted
- Reducing a problem by 1 produces a sequence of 1...n which is quadratic cost
- Dividing a problem in half produces a log n cost
- Best and worst cases are fairly straightforward, average case is usually harder
(involves probability analysis)
- Usually talk about cost in terms of comparsions, can also address number of
swaps and amount of additional memory required
- We'll assume he availability of the following swap method:
static E void swap(E [] arr, int i, int j) {
E t = arr[i];
arr[i] = arr[j];
arr[j] = t;
}
Bubble Sort
- Start with unsorted array
- Go through array exchanging adjacent neighbors out of sequence,
a[i] > a[i+1]>
- After first pass, largest element has bubbled to end of array
- Repeat process, sorted array (at end) grows by 1, unsorted array shrinks by 1
- Done when unsorted portion is of size 1 (no longer unsorted)
Cost?
- First time through, n-1 comparisons
- Each successive time requires 1 less comparison
- the latest 'largest' element has bubbled its way to its final position
at the beginning of the sorted portion
- n-1, n-2, .... 2, 1 results in n2
- No distinct best or worst cases
- # swaps?
sort(a[0 ... n-1]) {
for (last = n-1 ... 1)
for (i = 0 ... last-1)
if (a[i] > a[i+1]) swap them
}
An optimization
- Each time through check if any elements were exchanged, if not, done!
Cost?
- Best case-- already sorted, no exchanges on first pass, results in n-1 comparisons
- Why is this not so unlikely?
- Worst case, must continue to last pass -- reverse sorted -- n2
sort(a[0 ... n-1]) {
for (last = n-1 ... 1 && swapped an element last time through)
for (i = 0 ... last-1)
if (a[i] > a[i+1]) swap them
}
Insertion sort
- Start with sorted array of size 1 at beginning of array
- Insert second element of array into proper position in sorted array
- Hold onto the second element, a hole is now formed
- Slide larger elements toward end of array.
- Drop second element into hold positioned at proper location
- Continue inserting elements until unsorted portion is of size 1
Cost?
- Best case -- already sorted, each insertion requires 1 comparison, results in n-1
- Worst cast -- reverse sorted, first insertion requires 1 comparison, then 2, etc,
results in 1...n-1, or quadratic
- What about swaps?
sort(a[0 ... n-1]) {
for (next = 1 ... n-1) {
hold onto a[next]
for (i = next-1 ... 0 && a[i] > held item)
slide a[i] one position to right
place held item into a[i+1]
}
}
Selection Sort
- Go through array picking up largest elements, swap with last element
- Optimization — only swap if last element is not already largest
- good if array is already sorted
- Repeat until only one element in unsorted array-- sorted array grows from rear
Cost?
- Best case is same as worst case -- quadratic
- # swaps?
sort(a[0 ... n-1])
{
for (last = n-1 ... 1)
find largest element in a[0 ... last]
swap with a[last]
}
Quicksort
- Choose arbitrary element from array (we'll take the first)
- Partition array around this chosen element so that when done
- Chosen element is in its 'final sorted' position
- All elements less then the chosen chosen element to its left
- All elements greater then the chosen chosen element to its right
Note the result is NOT sorted; rather we say its partitioned
- Do this recursively for both sides-- done when subarray size is 0 or 1
sort(a[lb ... ub]) {
if (lb >= ub) done
pos = position where placed chosen element in a[lb ... ub]
sort(a[lb ... pos-1]);
sort(a[pos+1 ... ub]);
}
Cost?
- Best case -- partition element always lands in middle
- Array size is divided in half -- gives us log
- All the subarrays add up to about n comparisons
- Results in n logn
- Worst case -- partition element lands at either end
- Array size is reduced by 1 element
- end up with n-1, n-2, ...1 comparisons-- quadratic!
- Average turns out to be n log n
- Clever techniques are used to prevent or minimize worst case
- e.g., use median of first, middle, last element as partition element
Heapsort
Preliminaries
- 'heap' - tree with property that element at every node is greater than either child
- largest element in heap is at root
The basic idea:
- Create heap
- Remove root
- 'reheapify'
- Repeat
Making it efficient
- Complete (or nearly complete) tree can be nicely represented using an array
- node at index i has left child at 2*i+1 and right child at 2i+2
- 0 → 1, 2
- 1 → 3, 4
- 2 → 5, 6
- node at index i has (unique) parent at (i-1)/2
- 1 → 0, 2 → 0
- 3 → 1, 4 → 1
- 5 → 2, 6 → 2
- since i uniquely maps to 2i and 2i+1, we have a representation of a left and right child
- Since 2i and2i+1 uniquely map back to i, we have a parent representation as well
The actual algorithm:
- Two phases
- Initialization: creating the initial heap
- Selecting largest element and reheapifying (restoring the heap property)
sort(a[0 ... n-1) {
for (i = n/2-1 ... 0)
adjust a[i ... n-1] so it becomes a heap
for (last = n-1 ... 1)
swap a[0] a[last]);
adjust a[0 ... last-1] so it becomes a heap
}
Cost?
- Turns out to always be n log n
- Overview
- Tree represented by the array is (nearly) complete
- Longest path in complete tree of n nodes is log n
- Adding a level doubles the nodes in a tree
- Going up a level halves the number of nodes
- Given n nodes can halve log n times
- Given a tree that is a heap with the possible exception of the
root, heapifying requires log n comparisons
- Start at root, compare, possibly swap and move down
- Initializing
- Start at end move towards beginning and reheapify each time
- Nead to heapify n-1 times
- heap is never larger than n, thus log n to heapify
- Result is n log n
- Removing largest and reheapifying
- n-1 elements selected,
- heap is no larger than n, thus log n reheapify
- Results in n log n