Programming #5: Graphs

Imagine that we want to represent a social network in a data structure. A connection between two people will imply that they know each other. With what we have discussed so far, trees jump out as the obvious way to represent connections between nodes. But recall that a tree does not allow for cycling, i.e the bottom of a tree can’t be connected to the top. With a social network, this structure doesn’t make sense.

A graph on the other hand allows for cycling. There is no root node like with a tree, instead everything is connected in a web. In the social network example, names of people would be the vertices/nodes of our graph and each line connecting two vertices would be an edge.

Direction and Connectivity

Graphs can be directed or undirected. If Bob knows John, it is implied that John knows Bob as well. This would be an example of an undirected graph since the connection goes both ways. However some connections can only go one way, for example traveling routes on a world trip. This would be an example of a directed graph where the direction of an edge matters.

Graphs can also be connected or disconnected. If a graph is connected, every vertex will have an edge that connects it with the rest of the graph. If a graph is unconnected, there can be islands of vertices that have no connections. Connectivity is a measure of the minimum number of edges that would need to be removed to make a connected graph disconnected. It can be thought of as how robust a network is.

Sometimes we want to put numeric values on the connections. For example, if Bob knows John really well, the weight on the connection could be high. Or on a map, connections could contain how far apart cities are. We use weighted graphs for this.

Graph Representations

There are many different representations of graphs to choose from depending on the problem that we want to solve. We usually care about three things for our representation: how much space it takes, how long it takes to find if an edge exists, and how long it takes to find the neighbors of a given vertex.

An edge list is the simplest way to represent a graph. These are just arrays of edges. If a vertex is connected to another vertex, the edge list would contain an array of the two vertices. The total space needed will be O(E) and searching will take O(E) in a non-organized list. In python, an edge list would look like:

edgeList = [['John', 'Bob'], ['Bob', 'Sally'], ['Bob', 'Sue']]

An adjacency matrix is a representation that uses a 0 or 1 value in a matrix to signal whether the there is an edge present for the corresponding index values. For example, if there is a connection between vertex 1 and 5 there would be a 1 value at row 1 column 5. This representation allows us to find whether an edge exists in constant time by just checking the corresponding index. The space taken is O(V2) since all those 0s will take up space. Finding neighbors is also difficult since we will have to check over all the vertices in a row. In python, an adjacency matrix would look like:

adjMatrix = [[0, 1, 0, 0, 0],
             [0, 0, 0, 0, 0],
             [0, 0, 0, 0, 0],
             [1, 0, 0, 0, 1],
             [0, 0, 0, 0, 0]]

An adjacency list is a combination of an adjacency matrix and an edge list. For each vertex in our graph, we store an array of the vertices adjacent to it. For example, if John is connected to Sue and Bob, we would store a list containing Sue and Bob under John. We can now find neighbors and edges in constant time. However, space required will be O(2E) since every edge will be represented twice. In python, an adjacency list would look like:

adjList = {'John': ['Bob'],
           'Bob': ['John', 'Sue'], 
           'Sue': ['Bob']}

Graph Traversal

There are two main ways to traverse a graph: a depth first search where we follow one path as far was we can go, and a breadth first search where we look at all the vertices adjacent to a vertex before we move on.

For depth first search (DFS), start at any vertex and mark it as seen and store it on a stack. Then, pick an edge, follow it, and mark the new vertex as seen, also adding it to the stack. When you eventually hit a vertex that you have seen before, just go back and try a new edge. If there are no new edges, pop the vertex from the stack and go back another level. Once everything is popped from the stack you are done. This can be done recursively too. Our run time will be O(E+V).

For breadth first search (BFS), start at any vertex and mark it as seen and store it in a queue. Then, pick an edge, follow it, and mark the new vertex as seen, also adding it to the queue. We then return to the prior vertex and repeat with the rest of the edges. When we run out of edges, we dequeue our queue and proceed to the next element as a starting place. Our run time will be O(E+V).


The following code builds out a graph from scratch and implements DFS and BFS. Our DFS solution will be recursive and our BFS solution will be iterative:

class Node(object):
    def __init__(self, value):
        self.value = value
        self.edges = []
        self.visited = False

class Edge(object):
    def __init__(self, value, node_from, node_to):
        self.value = value
        self.node_from = node_from
        self.node_to = node_to
class Graph(object):
    def __init__(self, nodes=None, edges=None):
        self.nodes = nodes or []
        self.edges = edges or []
        self.node_names = []
        self._node_map = {}

    def set_node_names(self, names):
        The Nth name in names should correspond to node number N.
        Node numbers are 0 based (starting at 0).
        self.node_names = list(names)

    def insert_node(self, new_node_val):
        Insert a new node with value new_node_val
        new_node = Node(new_node_val)
        self._node_map[new_node_val] = new_node
        return new_node

    def insert_edge(self, new_edge_val, node_from_val, node_to_val):
        Insert a new edge, creating new nodes if necessary
        nodes = {node_from_val: None, node_to_val: None}
        for node in self.nodes:
            if node.value in nodes:
                nodes[node.value] = node
                if all(nodes.values()):
        for node_val in nodes:
            nodes[node_val] = nodes[node_val] or self.insert_node(node_val)
        node_from = nodes[node_from_val]
        node_to = nodes[node_to_val]
        new_edge = Edge(new_edge_val, node_from, node_to)

    def get_edge_list(self):
        Return a list of triples that looks like this:
        (Edge Value, From Node, To Node)
        return [(e.value, e.node_from.value, e.node_to.value)
                for e in self.edges]

    def get_edge_list_names(self):
        Return a list of triples that looks like this:
        (Edge Value, From Node Name, To Node Name)
        return [(edge.value,
                for edge in self.edges]

    def get_adjacency_list(self):
        Return a list of lists.
        The indecies of the outer list represent "from" nodes.
        Each section in the list will store a list
        of tuples that looks like this:
        (To Node, Edge Value)
        max_index = self.find_max_index()
        adjacency_list = [[] for _ in range(max_index)]
        for edg in self.edges:
            from_value, to_value = edg.node_from.value, edg.node_to.value
            adjacency_list[from_value].append((to_value, edg.value))
        return [a or None for a in adjacency_list] # replace []'s with None

    def get_adjacency_list_names(self):
        Each section in the list will store a list
        of tuples that looks like this:
        (To Node Name, Edge Value).
        Node names should come from the names set
        with set_node_names.
        adjacency_list = self.get_adjacency_list()
        def convert_to_names(pair, graph=self):
            node_number, value = pair
            return (graph.node_names[node_number], value)
        def map_conversion(adjacency_list_for_node):
            if adjacency_list_for_node is None:
                return None
            return map(convert_to_names, adjacency_list_for_node)
        return [map_conversion(adjacency_list_for_node)
                for adjacency_list_for_node in adjacency_list]

    def get_adjacency_matrix(self):
        Return a matrix, or 2D list.
        Row numbers represent from nodes,
        column numbers represent to nodes.
        Store the edge values in each spot,
        and a 0 if no edge exists.
        max_index = self.find_max_index()
        adjacency_matrix = [[0] * (max_index) for _ in range(max_index)]
        for edg in self.edges:
            from_index, to_index = edg.node_from.value, edg.node_to.value
            adjacency_matrix[from_index][to_index] = edg.value
        return adjacency_matrix

    def find_max_index(self):
        Return the highest found node number
        Or the length of the node names if set with set_node_names().
        if len(self.node_names) > 0:
            return len(self.node_names)
        max_index = -1
        if len(self.nodes):
            for node in self.nodes:
                if node.value > max_index:
                    max_index = node.value
        return max_index

    def find_node(self, node_number):
        Return the node with value node_number or None
        return self._node_map.get(node_number)
    def _clear_visited(self):
        for node in self.nodes:
            node.visited = False

    def dfs_helper(self, start_node):
        Helper function for a recursive implementation
        of Depth First Search iterating through a node's edges.
        ret_list = [start_node.value]
        start_node.visited = True
        edges_out = [e for e in start_node.edges
                     if e.node_to.value != start_node.value]
        for edge in edges_out:
            if not edge.node_to.visited:
        return ret_list

    def dfs(self, start_node_num):
        Outputs a list of numbers corresponding to the traversed nodes
        in a Depth First Search.
        start_node = self.find_node(start_node_num)
        return self.dfs_helper(start_node)

    def dfs_names(self, start_node_num):
        Return the results of dfs with numbers converted to names.
        return [self.node_names[num] for num in self.dfs(start_node_num)]

    def bfs(self, start_node_num):
        Iterative implementation of Breadth First Search
        iterating through a node's edges.
        node = self.find_node(start_node_num)
        ret_list = []
        queue = [node]
        node.visited = True
        def enqueue(n, q=queue):
            n.visited = True
        def unvisited_outgoing_edge(n, e):
            return ((e.node_from.value == n.value) and
                    (not e.node_to.visited))
        while queue:
            node = queue.pop(0)
            for e in node.edges:
                if unvisited_outgoing_edge(node, e):
        return ret_list

    def bfs_names(self, start_node_num):
        Return the results of bfs with numbers converted to names.
        return [self.node_names[num] for num in self.bfs(start_node_num)]





Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s