Programming #2: Searching and Sorting

In computer science, we use algorithms to accomplish various tasks. An algorithm is just a series of steps that results in a desired outcome. For example, driving to work can be broken down into an algorithm: turn on car, back out, turn left, turn right, stop when arrived, etc.

Being able to create and apply algorithms allows us to write good programs. By good, we mean that it both solves the problem and does so efficiently. Almost every problem can be solved by “try every possible solution”, but this is rarely practical for application. Instead, we seek to get the right answer with the least amount of work.

This post will explore some of the most frequently used algorithms in computer science. Understanding the concepts behind them will lay the foundation for building even more complex algorithms.

Binary Search

Imagine someone asked you to open a book to page 200. You probably wont start at page 1, flip to page 2, flip to page 3, and continue until you arrive at page 200. Instead you might open the book to somewhere in the middle and then see if that page is lower or higher than 200, then repeat from there. This strategy is known as binary search.

Binary search is an efficient algorithm for finding items in a sorted list (like page numbers in a book). It works by dividing the list in half, comparing the values, and repeating until the value is either found or deemed missing.

If we checked every item in a list one by one, it would take time of O(n) since every element (n) would need to be checked. If we use binary search, we are eliminating half the list with every guess we make. For example, if we start with an 8 item list, we will reduce the size to 4, then to 2, then to 1. That is 3 guesses. At 16 it would be 4 guesses, 32 would be 5 guesses and so on. Every time we double the size of the list it requires at most one more guess. Binary search takes time of O(log(n)) and space of O(1).

An example of binary search implemented in Python:

def binary_search(input_array, value):
    """
    Return index where value is found.
    Return -1 if value not found.
    """
    low = 0
    high = len(input_array) - 1
    while low <= high:
        mid = (low + high) / 2
        if input_array[mid] == value:
            return mid
        elif input_array[mid] < value:
            low = mid + 1
        else:
            high = mid - 1
    return -1

Recursion

Before we get to some of the sorting algorithms, we need to talk about recursion. Recursion is one of the more difficult concepts in computer science to understand intuitively. Imagine a Russian doll, every time you open it you get the same doll but a smaller version. You continue this until you get to the smallest doll and the process stops.

Recursion uses the same strategy. We solve a problem by solving a smaller instance of the same problem, until we get to an instance so small we can just directly solve it. A recursive function will usually have 3 aspects: 1) it calls itself at some point, 2) it has an exit condition so it doesn’t repeat indefinitely, and 3) it modifies the input at each layer so it doesn’t repeat indefinitely.

Consider this simple example of a recursive function in Python:

def recursive(in_val):
    if in_val <= 0: # exit condition
        return in_val
    else:
        output = recursive(in_val - 1) # modify and iterate
        return output

To see how recursive functions can be used to solve real world problems, consider this example of a recursive function used to find the value of the Fibonacci sequence at a given position:

def get_fib(position):
    """
    Recursively find desired Fibonacci sequence value at
    target position.
    """
    if position == 0 or position == 1:
        return position
    return get_fib(position - 1) + get_fib(position - 2)

Bubble Sort

When we were searching for a page in a book, we had the benefit of everything already being sorted. But what if we needed to sort the pages first? The first sorting algorithm we will consider is bubble sort. Bubble sort is a naive approach to this problem that repeatedly steps through a list, comparing each pair of adjacent items, and swapping them if they are in the wrong order. This process is repeated until no swaps are needed.

Since bubble sort compares every element in a list to every other element in a list, it will take time of O(n2) on average, but has a best case of O(n) if the list is already sorted. Bubble sort is implemented in place, so the space complexity is O(1). An example of bubble sort implemented in Python:

def bubblesort(array):
    """
    Sort an array by comparing every element.
    """
    for i in range(len(array)):
        for j in range(i+1, len(array)):
            if array[j] < array[i]:
                array[j], array[i] = array[i], array[j]
                
    return array

Merge Sort

We can do better than bubble sort by breaking the problem up. Merge sort is an example of a divide and conquer algorithm that does just that. First you divide the problem into a number of sub-problems, then you conquer the sub-problems by solving them recursively, and finally you combine the solutions to the sub-problems into the solution of the original problem.

In the case of merge sort, we are dividing the list in half at every iteration, then recursively sorting the halves when we combine them back together into a single list. Merge sort takes time of O(nlog(n)) on average and since merge sort copies values to new lists, its space complexity is O(n). An example of merge sort implemented in Python:

def mergesort(array):
    """
    Sort an array with merge sort strategy.
    """
    if len(array)>1:
        mid = len(array) // 2
        lefthalf = array[:mid]
        righthalf = array[mid:]

        mergesort(lefthalf)
        mergesort(righthalf)

        i=0
        j=0
        k=0
        
        while i < len(lefthalf) and j < len(righthalf):
            if lefthalf[i] < righthalf[j]:
                array[k] = lefthalf[i]
                i = i + 1
            else:
                array[k] = righthalf[j]
                j = j + 1
            k = k + 1

        while i < len(lefthalf):
            array[k] = lefthalf[i]
            i = i + 1
            k = k + 1

        while j < len(righthalf):
            array[k] = righthalf[j]
            j = j + 1
            k = k + 1
            
    return array

Quick Sort

Merge sort was faster than bubble sort but required more memory. Quick sort splits the difference. The main difference is that merge sort did the work when combining and quick sort does the work when dividing. Quick sort starts by dividing the list along a random element called the pivot. Once the pivot is found, rearrange the elements so that all the elements higher than the pivot are on the right and all the elements lower are on the left. Repeat this process recursively until the list is sorted.

Quick sort will take time of O(nlog(n)) on average, but it can take O(n2) in the worst case. Quick sort is implemented in place so the space complexity is O(1). In practice, quick sort outperforms merge sort. An example of quick sort implemented in Python:

def quicksort(array):
    """
    Sort an array with quick sort strategy.
    """
    less = []
    equal = []
    greater = []

    if len(array) > 1:
        pivot = array[0]
        for x in array:
            if x < pivot:
                less.append(x)
            elif x == pivot:
                equal.append(x)
            else:
                greater.append(x)
        return quicksort(less) + equal + quicksort(greater)
    else:
        return array
Advertisements

Author: David Shahrestani

"I have the strength of a bear, that has the strength of TWO bears."

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s