The next data structure we’ll be talking about is a tree. Like their real life counterparts, trees as a data structure have a root, some branches, some leaves, etc. In programming, a tree is basically just an extension of a linked list. Recall that an element or node in a linked list stores its value and a pointer to the next value. With a node in a tree, we store the value of a node and the next values for its children. A tree must also be completely connected, if we start at the first node, we can reach every element in the tree. There will also be no cycles, going down a tree can’t bring us back up.
A tree can be described by how many levels it has. The root is the first node in the tree and occurs at level 1. Anything connected to the root occurs at level 2, and so on. A parent is a node connected to other nodes at a lower level. A child is a node connected to other nodes at a higher level. The nodes at the end of the tree with no children are called leaves. Connections between a parent and a child are called edges, and a collections of edges is called a path. The height of a node is just the number of edges between it and the furthest leaf on the tree. The depth of a node is the inverse, the number of edges back up to the root.
Recall that when we looked at list-based data structures, we could just move through the list in order to see every element. Trees do not have a clear ordered path that can be traversed. If we start at the root, should we go left or right, should we check all nodes on the same level or all the way down first?
There are two main approaches to this problem, depth-first search (DFS) and breath-first search (BFS). With DFS, we give priority to exploring children nodes. With BFS, we give priority to exploring nodes on the same level.
An example of a BFS algorithm is the level order search. We start at the root and then visit all of its children on the second level, then visit all of those node’s children on the third level, and continue till we reach the leaves.
An example of a DFS algorithm is the pre-order search. We start at the root and traverse down the leftmost children until we hit a leaf. As we move down we check off every node we travel through. When we hit the leaf we go back up to the parent and traverse to any right children. This process continues until we’ve seen everything. With an in-order search we would only check off a node when we’ve seen its child and come back to it. With post-order search we would only check off a node if we’ve seen all of its children.
An example of a pre-order search implemented in Python:
class Node(object): def __init__(self, value): self.value = value self.left = None self.right = None class BinaryTree(object): def __init__(self, root): self.root = Node(root) def search(self, find_val): """ Return True if the value is in the tree, return False otherwise. """ return self.preorder_search(tree.root, find_val) def print_tree(self): """ Print out all tree nodes as they are visited in a pre-order traversal. """ return self.preorder_print(tree.root, "")[:-1] def preorder_search(self, start, find_val): """ Helper method - use this to create a recursive search solution. """ if start: if start.value == find_val: return True else: return self.preorder_search(start.left, find_val) or \ self.preorder_search(start.right, find_val) return False def preorder_print(self, start, traversal): """ Helper method - use this to create a recursive print solution. """ if start: traversal += (str(start.value) + "-") traversal = self.preorder_print(start.left, traversal) traversal = self.preorder_print(start.right, traversal) return traversal
Binary Search Tree
The searching algorithms we talked about so far will take O(n) in the worse case, since they will need to check every possible node. However, rules can be added to our tree to speed up this process. Depending on our goal, there are many different approaches we can take. For example, a binary search tree (BST) is a special case binary tree where every value to the left of a particular node is smaller than that node and every value to the right is larger.
Now when we search we will know which direction to go from the start. If the value we are looking for is larger than the value of the node we are on, we go right. Search on a balanced BST will take O(log(n)) which is the height of the tree. If the tree is non-balanced, for example a tree with one long leg, the search time will be O(n).
An example of a BST implemented in Python:
class Node(object): def __init__(self, value): self.value = value self.left = None self.right = None class BST(object): def __init__(self, root): self.root = Node(root) def insert(self, new_val): """ Add node to BST. """ self.insert_helper(self.root, new_val) def insert_helper(self, current, new_val): """ Find location for insertion. """ if current.value < new_val: if current.right: self.insert_helper(current.right, new_val) else: current.right = Node(new_val) else: if current.left: self.insert_helper(current.left, new_val) else: current.left = Node(new_val) def search(self, find_val): """ Implement search in BST. """ return self.search_helper(self.root, find_val) def search_helper(self, current, find_val): """ Find location in search. """ if current: if current.value == find_val: return True elif current.value < find_val: return self.search_helper(current.right, find_val) else: return self.search_helper(current.left, find_val) return False
A heap is another special case tree with the added rule that elements are arranged in increasing or decreasing order. In other words, the root element is either the maximum or minimum value in the tree. In a max heap, the parent must always be larger than its children. In a min heap, the opposite. Finding the maximum or minimum value will happen in constant time O(1), but searching further will take O(n) since we don’t know what direction to go from the root node.
The main benefit of heaps is that insertion times are quicker since our level structure is explicit. This makes the structure excellent for implementing a priority queue. For example, if you want to queue up items and then only dequeue the maximum or minimum value. The worst case insertion time will be O(log(n)).
In Python, the heapq library has this functionality built-in.