CISC 1115 Lecture #15

CISC 1115
Introduction to Programming Using Java
Lecture #15
Sorting & Searching

Overview

Why search?
Why sort?

Searching

To search an array for a particular value

Loop through the array comparing each element with the value
If you find a match return either true, or the index
If you drop through the loop, return some indication of failure -- false or -1 (why -1?)
- A classical error is to return failure after not matching the first element only

The above is known as linear search (because it moves in a straight line through the array performing the search).

Sorting

Sorting involves rearranging the elements of the array so that the elements are either in ascending or descending order. That is:

An array is sorted in ascending order if every element is less than (or equal to) the element to the right
An array is sorted in descending order if every element is greater than (or equal to) the element to the right

Note that the last element has no element to the right to compare against (similarly element 0 has no element to its left to be compared to).

Selection Sort

The algorithm (i.e., the approach) used by selection sort corresponds closely to the technique you might use if you were asked to sort a deck of cards, a pile of folders, a list of names:

Find the smallest value, set it aside, and make it the first
Find the smallest value of the remaining items and make it the second
Repeat until you've gone through the whole set of items

Selection sort basically does the same thing, but we take into consideration that we're working with an array:

We don't set the smallest value to the side, we rather move it to the beginning of the array
- We have to do something with the element that's already there-- we'll swap them
Each successive 'smallest' element will go into the second, third, etc elements of the array

We can thus think of the array as being divided into two parts:

The left side which is sorted and growing
The right side which is unsorted and shrinking

What we will do is ave a pair of nested loops:

The outer loop will determine which 'smallest' element we're selecting
- The first time through it will be the actual smallest element, the second time through the next-to-smallest, etc.
- Each time through this outer loop, the sorted portion gets one bigger and the unsorted portion one element smaller.
The inner loop will find the smallest element remaining in the unsorted portion
- Start at the left end of the sorted portion and move through the array
- Whenever we find an element smaller than the leftmost element, swap them
- When we have reached the end of the array, the leftmost element contains the smallest (remaining) value

Initially:

+----------------------------------------------------------------------------+
|                                                                            |
|                          unsorted                                          |
|                                                                            |
+----------------------------------------------------------------------------+

After one iteration through the outer loop:

+----------------------------------------------------------------------------+
|        |                                                                   |
|smallest|                 unsorted                                          |
|        |                                                                   |
+----------------------------------------------------------------------------+

Halfway through the outer loop:

+----------------------------------------------------------------------------+
|                               |                                            |
|         sorted                |          unsorted                          |
|                               |                                            |
+----------------------------------------------------------------------------+

At end:

+----------------------------------------------------------------------------+
|                                                                            |
|                            sorted                                          |
|                                                                            |
+----------------------------------------------------------------------------+

The Selection Sort Algorithm (SwappySelectionSort)

A very high-level statement of the approach:

Go through the array from beginning to end
   Find the smallest element in the array from the current position to the end,
	   and place it in that position

In a bit more detail:

Go through the array from beginning to end
   Go through the array from the current position to the end
      If an element is less than the value in the current position swap them

Let's start getting down to code-level

for (currentPosition = 0; currentPosition < size; currentPosition++)
   for (j = currentPosition+1; j < size; j++)
      if (arr[j] < arr[currentPosition] 
         swap(arr[j], arr[currentPosition])

Here it is turned into a method:

void selectionSort(int [] arr, int size) {
	for (currentPosition = 0; currentPosition < size; currentPosition++)
	   for (j = currentPosition+1; j < size; j++)
		 if (arr[j] < arr[currentPosition] 
		    swap(arr, j, currentPosition)
}

An Optimization of Selection Sort (SelectionSort)

Rather than constantly swapping each time we find a new minimum value, we remember the position of the smallest value encountered and it's only once we've gone completely through the array in the inner loop that we swap the smallest value into the current position.

void selectionSort(int [] arr, int size) {
	for (currentPosition = 0; currentPosition < size; currentPosition++)
		int pos = currentPosition;
		for (j = currentPosition+1; j < size; j++)
			if (arr[j] < arr[currentPosition] 
				pos = j;
		swap(arr, pos, currentPosition)
}

One More Optimization (LazySelectionSort)

Each time we've gone completely through the inner loop, before we swap the smallest element to the current position, we make sure the smallest value is not already at the current position.

void selectionSort(int [] arr, int size) {
	for (currentPosition = 0; currentPosition < size; currentPosition++)
		int smallestPosition = currentPosition;
		for (j = currentPosition+1; j < size; j++)
			if (arr[j] < arr[currentPosition] 
				smallestPosition = j;
		if (smallestPosition != currentPosition) swap(arr, pos, currentPosition)
}

and here it is with the inner loop (a single pass through the array) turned into a method:

void selectionSort(int [] arr, int size) {
	for (currentPosition = 0; currentPosition < size; currentPosition++)
		int smallestPosition = findSmallestPosition(arr, size);
		if (smallestPosition != currentPosition) swap(arr, pos, currentPosition)
}

int findSmallestPosition(int [] arr, int size) {
	int smallestPosition = currentPosition;
	for (j = currentPosition+1; j < size; j++)
		if (arr[j] < arr[currentPosition] 
			smallestPosition = j;
	return smallestPosition;
}

A Second Sort - Bubble Sort

In bubble sort:

We make repeated passes through the array
During each pass, we compare adjacent elements.
If two adjacent elements are out of order, we swap them.
When we perform an entire pass without a single swap, we're done

Some Questions to Answer

Why does this work?
How do we know it will eventually end?
Why are we bothering with a second sort?

Why does this work?

Each time we go through the array, we're making it a bit more sorted
- Out-of-order adjacent elements are being put into proper order
Furthermore, the largest elements are bubbling towards the right to their proper positions

How do we know this will eventually end?

Each pass through the array has the next largest element settling into its proper position as a result of the bubbling
After n-1 passes every element has been deposited in its proper position

The Bubble Sort Algorithm(SmartBubbleSort)

A very high-level statement of the approach:

Pass through the array until a pass is completed with no swaps performed
		 Compare adjacent elements and swap if they're out of order

In a bit more detail:

   Go through the array from the beginning to end
      If an element at position i is less than the element at position i+1 swap them
If a swap was performed repeat

Let's start getting down to code-level

do {
   bool swapped = false;
   for (i = 0; i < n-1; i++)
      if (arr[i] > arr[i+1] { 
         swap(arr, i, i+1)
         swapped = true;
      }
} while (swapped)

Notice the pattern of how swapped is used:

It's reminiscent of finding the max--
- Assume first is max and then check rest of the array and correct as needed
Same thing here
- Assume swapped will be false and then go through array and correct accordingly

Here is a very simple version of bubble sort for you to remember:

for (int i = 0; i < n; i++)
	for (int j = 0; j < n-1; j++)
		if (arr[j] > arr[j+1])
			swap(arr, j, j+1);

Notes

Each time through the outer loop, the next largest element 'bubbles' it to the right until it settles into its proper place
Any smaller element that is swapped with a larger predecessor moves only one position to the left during a pass through the array
- If the smallest element is at the last position, it will take n-1 pass for it to be swapped to the first position (where it belongs)
- That means at the very worst, we need no more than n passes through the array
The inner loop represents a single pass through; since we are comparing each element with its neighbor to the right; and the last element has no neighbor to the right, we stop at the next to last element (n-2) (not doing so would result in an ArrayIndexOutOfBoundsException)

Here is a minor variation that's a bit more efficient:

for (int last = n-1; last > 0; last--)
	for (int i = 0; i < last; i++)
		if (arr[i] > arr[i+1])
			swap(arr, i, i+1);

Notes

The outer loop is still controlling the passes through the array, but it starts at the end this time
- Since the largest element bubbles to the last position on the first pass, there is no reason to compare against it in the next (or subsequent passes)
- We thus use the index of the outer loop to represent the last position of the unsorted portion of the array to be compared against; all positions beyond last are already sorted

And here is this version with the swapped optimizationi (i.e. checking for no swaps in a single pass):

boolean swapped = true;
for (int last = n-1; last > 0 && swapped; last--) {
	swapped = false;
	for (int i = 0; i < last; i++)
		if (arr[i] > arr[i+1]) {
			swap(arr, i, i+1);
			swapped = true;
		}
}

Notes

The boolean variable swapped will be set in the inner loop (the loop performing the comparison pass) if any swap is performed
The outer loop (the loop controlling the number of passes) adds a check if any swap has been done in the previous pass of the inner loop. If none have been done, the array is already sorted, and the sort terminates
The outer loop is forced into a first pass by initializing swapped to true

Why do we have to bother with two sorts

Examine bubble sort-- what if the array came in sorted to begin with? What about selection sort?
Our selection sort is actually a bit simple-minded in constantly swapping-- there's a more 'swap-efficient' version that does significantly less movement than the version we presented-- and bubble sort for that matter.

Program 24.1

Write a program that initializes an array of integers prints them, sorts them using bubble sort, and prints the sorted array.

Binary Search — Searching a Sorted Array

We've already seen a linear search of an array, in which we begin at the first element and compare successive elements for equality with the value we are looking for. If the array is sorted, we can do quite better than searching the entire array:

A High-level Version of the Approach

Compare the value with the middle (sub)array element
If they're equal, you're done!
Otherwise, if the value being searched for is less then the element in the middle, we want to search the subarray consisting of the left half of the array
Otherwise, we should search the subarray consisting of the right half of the array

A More Detailed Presentation

As we will be looking at various subarrays (left side right side, etc), we initially view the entire array as a subarray with bounds 0 … arr.length-1 (inclusive). We will call these bounds lo and hi respectively.

if lo and hi have crossed, i.e., lo is greater than hi, i.e., the subarray is empty, report failure
Calculate the location of the middle element of the subarray lo … hi; call it mid
Compare the middle element with the value to be searched
if equal, return the position
if the value is less than the middle element, repeat step 1, using the subarray lo … mid-1
otherwise repeat step 1, using the subarray mid+1 … hi

Here is the actual Java code for binary search:

public static int find(int [] arr, int val) {
	int lo = 0, hi = arr.length-1;
	while (lo <= hi) {
		int mid = (hi+lo)/2;
		if (val == arr[mid]) 
			return mid;
		else if (val < arr[mid]) 
			hi = mid - 1;
		else
			lo = mid + 1;
	}
	return -1;
}

CISC 1115 Introduction to Programming Using Java Lecture #15 Sorting & Searching

Overview

Searching

Sorting

Selection Sort

The Selection Sort Algorithm (SwappySelectionSort)

An Optimization of Selection Sort (SelectionSort)

One More Optimization (LazySelectionSort)

A Second Sort - Bubble Sort

Some Questions to Answer

Why does this work?

How do we know this will eventually end?

The Bubble Sort Algorithm(SmartBubbleSort)

Why do we have to bother with two sorts

Program 24.1

Binary Search — Searching a Sorted Array

A High-level Version of the Approach

A More Detailed Presentation

Files used in this Lecture

CISC 1115
Introduction to Programming Using Java
Lecture #15
Sorting & Searching