To illustrate the basic approach to developing and analyzing algorithms, let’s consider a simple case of the dynamic connectivity problem. Basically, if we are given a list of connections between elements, and then given two elements, we want to return whether or not they are connected. More formally:

Given N objects:

- Union command: connect two of the objects.
- Find/connected query: is there a path connecting the two objects?

For example:

union(4, 3) union(3, 8) union(7, 6) connected(4, 7) # False

Unlike the shortest path problem we attempted last time, we aren’t concerned with what the path is, only if there is a path.

### Union-Find

We start by modeling the problem. In this case, we will want to create a data structure that can keep track of connections and also allow us to quickly search those connections. Graphs and Trees should jump out at us, but recall that these structures also store paths. Since we don’t care about the paths, can we do better?

The union-find or disjoint-set data structure is a way of modeling data by grouping the data into subsets. For example: {3, 4, 8}{6, 7}. In this example, 3, 4, and 8 would be connected along some path, and 6 and 7 would be connected. Our find query would just have to check if two objects are in the same subset, and our union command will replace subsets containing two objects with their union. For example, union(3, 6) would return the subset {3, 4, 6, 7, 8}, since everything would now be connected.

### Quick find (eager approach)

It is time to take our model and convert it into an algorithm. A quick way to implement our find command is to structure the data as an array with length N. Each index will be a specific element in our data, and the value stored at the index will contain the same number if two indexes are connected. For example, if 1 and 3 are connected: [0, 1, 2, 1]. Notice index 1 and index 3 both have the same value. Finding connection will be as simple as checking whether the two indexes store equivalent values.

Our union command won’t be as easy. To connect 0 to 3 in the above example, we would need to change the values of index 0, 1, and 3 to be the same. When we are dealing with a lot of numbers, this can take a while.

Let’s think out the implementation of quick find. We can initialize our array so that every index stores its own value, then define union to set the value at the first value’s index (p) to the second value’s index (q). Finally, we will define find to check equivalency at two indices.

# Quick-find implementation # O(mn): m union-find operations on a set of n objects class QuickFindUF(object): def __init__(self, N): # O(n) self.array = range(0, N) def connected(self, p, q): # O(1) return self.array[p] == self.array[q] def union(self, p, q): # O(n) pid = self.array[p] qid = self.array[q] for i in xrange(len(self.array)): if self.array[i] == pid: self.array[i] = qid # Test qfuf = QuickFindUF(10) qfuf.union(1, 2) qfuf.union(5, 8) qfuf.union(2, 5) print qfuf.connected(1, 8) # True print qfuf.connected(3, 8) # False

The next step is to ask if the algorithm is fast enough/fits in memory? Here, our worst case time will be O(mn), where m is the number of union-find operations on a set of n objects. If we did 10 operations on a data set of 10 objects, our computer time would be 100. Quadratic algorithms do not scale well. For example, if we had 10^{9} union commands on 10^{9} objects, it would take about 30+ years of computer time to solve.

Since our algorithm isn’t fast enough, we will need to figure out why and address the problem until we are satisfied. Here, it is pretty obvious that the union command is eating up time, so let’s try adjusting that.

### Quick union (lazy approach)

We will start with the same array, but this time we will structure the data as a tree with nodes, parents, and roots. For example if the parent of 4 is 9, then the value at index 4 will be 9, and so on. Our find command will now require us to check if two elements have the same root, which will take more work than the quick find implementation, but we will gain time on the union command, which will just need to change the value of p’s root to the value of q’s root. Implementing this, we get:

# Quick-union implementation # O(mn): m union-find operations on a set of n objects class QuickUnionUF(object): def __init__(self, N): # O(n) self.array = range(0, N) def root(self, i): while (i != self.array[i]): i = self.array[i] return i def connected(self, p, q): # O(n) return self.root(p) == self.root(q) def union(self, p, q): # O(n) when cost of finding root is high pid = self.root(p) qid = self.root(q) self.array[pid] = qid # Test qfuf = QuickUnionUF(10) qfuf.union(1, 2) qfuf.union(5, 8) qfuf.union(2, 5) print qfuf.connected(1, 8) # True print qfuf.connected(3, 8) # False

Here, our worst case time will also be O(mn). The problem here is that trees can get to be too tall. Our find command in a perfectly tall tree could end up searching all of our data. The solution here seems to be that we need a way to keep the trees short. A balanced tree would make sure that we never get too long.

### Weighted quick union with path compression (optimal approach)

We will want to keep track of the size of each tree. If we are linking the roots of two trees, we will want to link the smaller tree to the root of the larger tree. This will ensure balance. We can flatten the tree even more by setting every other node along a path to point directly to the root.

To do this, we will need to add a new array size[i] that keeps track of the number of objects in a tree rooted at i. Our union command will need to be modified to check the size array first and then set the roots accordingly. Implementing this, we get:

# Weighted quick-union with path compression implementation # O(n+mlog*(n)): m union-find operations on a set of n objects class WeightedQuickUnionUF(object): def __init__(self, N): # O(n) self.array = range(0, N) self.size = [1 for i in range(0, N)] def root(self, i): while (i != self.array[i]): self.array[i] = self.array[self.array[i]] i = self.array[i] return i def connected(self, p, q): # O(log(n)) return self.root(p) == self.root(q) def union(self, p, q): # O(log(n)) pid = self.root(p) qid = self.root(q) if pid == qid: return if self.size[pid] < self.size[qid]: self.array[pid] = qid self.size[qid] += self.size[pid] else: self.array[qid] = pid self.size[pid] += self.size[qid] # Test qfuf = WeightedQuickUnionUF(10) qfuf.union(1, 2) qfuf.union(5, 8) qfuf.union(2, 5) print qfuf.connected(1, 8) # True print qfuf.connected(3, 8) # False

Here, our worst case time will be O(n+mlog(n)). How much does this help? Well, if we did 10^{9} union commands on 10^{9} objects, we would reduce the time from quick-find from 30 years to 6 seconds.

### Conclusion

We demonstrated the basic workflow for designing an algorithm that can solve a problem.

- Model the problem.
- Find an algorithm to solve it.
- Fast enough? Does it fit in memory?
- If not, figure out why.
- Find a way to address the problem.
- Iterate until we are satisfied.

In this case, we built out a data structure that can help us in a variety of applications like finding if a path exists on a map, in a social network, in a game like GO, in a cities fluid flow, in electrical systems, in a maze, etc.

References:

https://en.wikipedia.org/wiki/Disjoint-set_data_structure

https://en.wikipedia.org/wiki/Dynamic_connectivity

https://www.coursera.org/learn/introduction-to-algorithms