Chapter 8. Graph

Chapter 8. Graph#

1. Build#

We can simply use Leetcode conventions to represent graph as nodes and its adjacent matrix, which can be implemented as a list of list in python.

def buildGraph(n_nodes, connections):
    graph = {}
    for node in range(n_nodes):
        graph[node] = []
    
    for node1, node2 in connections:
        graph[node1].append(node2)
        # if build an undirected graph, add the following line
        # graph[node2].append(node1)
    return graph

# test
n_nodes = 5
connections = [[0, 1], [0, 2], [1, 2], [1, 3], [2, 3], [2, 4]]
graph = buildGraph(n_nodes, connections)
print(graph)

{0: [1, 2], 1: [2, 3], 2: [3, 4], 3: [], 4: []}

2. Traversal#

Traversal can solve problems like find paths given the source node and target node, and also the topological sorting sequences.

valid-DAG problem: check if a given DAG is valid

check if the DAG G has directed cycles
if no cycle, topological sort is a linear ordering of nodes along a horizontal line so that all directed edges go from left to right.
if cycles are present, topological sorting is not possible.

single-pair all paths problem:

find all possible paths from s to t

These two types of problems can be solved by traversing the graph. There are two different ways of traversing a graph:

Breadth-first Search (BFS)
Depth-first Search (DFS)

2.1. Valid DAG#

Check if a DAG is valid. There should have no directed cycles in the graph.

The following graph is a DAG, because node 0 points to node 2, node 1 points to node 0 and node 2, node 2 points to nothing. Obviously, there is no directed cycle.

G = [[2], [0,2], []]

This is not a valid DAG, because there is a directed cycle, i.e., node 0 -> node 2, node 2 -> node 1, and node 1 -> node 0.

G = [[2], [0], [1]]

Leetcode

0207

2.1.1. BFS#

The first logic that came to mind about using BFS is shown as follows:

use BFS to detect if there is a cycle
- initialize a queue, and a visisted set to track which node has been visited
- given node, travere its neighbor

This method, however, is not correct as its implementation.

the visited store all the nodes that has been visited.
the case [[0,1], [0,2], [2,1]], has a cycle from undirected graph point of view, but no cycle from directed graph point of view
when traversing the node 0, BFS will enqueue its dependency [1,2] and mark as visisted.
when going to the neighbor node 1, its neighbor node 2 is enqueued. But at previous step the node 2 has already been marked as visited, which will be identified INCORRETLY as a cycle.

The main reason is the visited set in this sense didn’t store parent/children information during traversal.

# the following code is only used for detecting cycles in a undirected graph
# for directed graph, this will not work as directed cycle is not the same as undirected cycle
def isValidDAG(graph):
    # check if there is a cycle in the graph
    def hasCycle(graph, source):
        # initialize a set to store the visited nodes
        visited = set()
        # initialize a queue to store the nodes to be explored
        queue = [source]
        # loop until the queue is empty
        while queue:
            # pop the first node in the queue
            node = queue.pop(0)
            # if the node is already visited, then there is a cycle
            if node in visited:
                return True
            
            # mark the node as visited
            visited.add(node)

            # explore all the neighbors of the node
            for neighbor in graph[node]:
                queue.append(neighbor)

        return False
    
    for i in graph.keys():
        if hasCycle(graph, i):
            return False 

    return True    

# test
# False
nodes = 2
connections = [[0, 1], [1, 0]]
graph = buildGraph(nodes, connections)
print(isValidDAG(graph))

# True
nodes = 5
connections = [[0, 1], [0, 2], [1, 2], [1, 3], [2, 3], [2, 4]]
graph = buildGraph(nodes, connections)
print(isValidDAG(graph))

False
False

A correct implementation is to use a indegree set to record the indegree of each node. Indegree is the number of dependencies a node requires, the number of incoming connections. When do a topological sorting, we need start from a node with 0 indegree, and such nodes are the starting point of a topological sort tree.

Algorithm:

step 0: get the indegree for each node in the graph
step 1: find nodes with indegree of 0, and enqueue to a queue
step 2: dequeue the queue，and decrease the indegree of the connected nodes from the popped node by 1
- 可以想象成在graph中把该节点抹去，那么该节点指向的其他节点的indegree就减1了
repeat step 1 and 2
the traversal order is the topological sort if there is no cycle detected
if the number of nodes that has been popped out from the queue equals to the number of total nodes, then there is no cycle.
- if there is a cycle, the number of nodes that have been popped out from the queue should be less than the toal number of nodes.

def isValidDAG(graph):
    indegree = {}
    for node in graph.keys():
        indegree[node] = 0
    for node in graph.keys():
        for edge in graph[node]:
            indegree[edge] += 1
        
    def hasCycle(graph, indegree):
        # initialize a queue to store the nodes with indegree 0
        queue = []
        for node in indegree.keys():
            if indegree[node] == 0:
                queue.append(node)
        
        # loop until the queue is empty
        while queue:
            # pop the first node in the queue
            node = queue.pop(0)
            # explore all the neighbors of the node
            for neighbor in graph[node]:
                indegree[neighbor] -= 1
                if indegree[neighbor] == 0:
                    queue.append(neighbor)
        
        # if there is a cycle, there will be a node with indegree > 0
        for node in indegree.keys():
            if indegree[node] > 0:
                return True
        
        return False
    
    return not hasCycle(graph, indegree)

# test
# False
nodes = 2
connections = [[0, 1], [1, 0]]
graph = buildGraph(nodes, connections)
print(isValidDAG(graph))

# True
nodes = 5
connections = [[0, 1], [0, 2], [1, 2], [1, 3], [2, 3], [2, 4]]
graph = buildGraph(nodes, connections)
print(isValidDAG(graph))

False
True

2.1.2. Depth-first Search#

todo: implement using two approaches
- traversal approach -> similar to backtracking
- subporblem approach -> recusion with return value

DFS approach is more straight-forward than BFS in terms of detecting a loop, as DFS uses recursions and if a node was visited before in the same recursion, then there is a cycle.

Simply check if a node is visited or not. If it is visited, then there is a loop in the graph.

FOLLOW UP:

if we want to know the path of the loop,

we can use a parent dictionary to keep track of the parent of each node. Then we can use the parent dictionary to trace back the path of the loop.
add a path parameter to the dfs function, and append the current node to the path. If there is a loop, then the path will contain the loop.

NOTES

think about when and how to use visited or path.

visited: the visited records all the nodes that have been visited, and explored. If during the recursion, the node marked as visited is visited again, there is no need to further do the calculation as it’s been done before.
- or we can understand it as: visited is for the global graph point of view to make sure the traversal is not performed twice for the same node.
path: the traced record all the nodes that are currently on the recursion path, which might not be visited/finished. - or we can understand it as: path is the current recursion path, and can be used to see if current path has loop.

def isValidDAG(graph):

    # initialize a set to store the visited nodes and a set to store the nodes in the current path
    visited = set()
    path = set()
    def hasCycle(graph, source):
        # if the node is already in path, then there is a cycle
        if source in path:
            return True
        # if the node is already visited, then there is no cycle
        if source in visited:
            return False
        
        # add the node to the path
        path.add(source)
        # add the node to visited
        visited.add(source)

        # explore all neighbors of the node
        for neighbor in graph[source]:
            if hasCycle(graph, neighbor):
                return True
        # remove the node from the path after recursion to restore the path to the state before the function call
        path.remove(source)
        
    # loop through all nodes in the graph in case the graph is not connected
    for i in graph.keys():
        if hasCycle(graph, i):
            return False 

    return True

# test
# False
nodes = 2
connections = [[0, 1], [1, 0]]
graph = buildGraph(nodes, connections)
print(isValidDAG(graph))

# True
nodes = 5
connections = [[0, 1], [0, 2], [1, 2], [1, 3], [2, 3], [2, 4]]
graph = buildGraph(nodes, connections)
print(isValidDAG(graph))

False
True

DFS vs Backtracking DFS aims to operate on the node itself, and backtracking operates on branch (from node to another node). See the following code differences.

    // DFS, focus on node
    def dfs(root):
        if not root:
            reutrn
        print(f"enter node {root}")
        for child in root.children:
            dfs(child)
        print(f"leave node {root}")

    // Backtracking, focus on branch
    def backtrack(root):
        if not root:
            return

        for child in root.children:
            // make a choice

            print(f"from node {root} to node {child}")
            backtrack(child)
            
            //undo a choice
            
            print(f"from node {child} to node {root}")

2.2 Single Pair All Paths Problem#

Given a directed acyclic graph (DAG) of n nodes labeled from 0 to n - 1, find all possible paths from node 0 to node n - 1 and return them in any order.

Leetcode

0797

2.2.1 BFS#

# BFS
def allPaths(graph, source, target):
    # initialize results
    paths = []
    # initialize a queue to store the paths
    queue = [[source]]
    
    # loop until the queue is empty
    while queue:
        # pop the first path in the queue
        path = queue.pop(0)
        # get the last node in the path
        node = path[-1]

        # if the node is the target, then return the path
        if node == target:
            paths.append(path)

        # explore all the neighbors of the node
        for neighbor in graph[node]:
            queue.append(path + [neighbor])
    return paths

graph = [[1,2],[3],[3],[]]
print(allPaths(graph, 0, 3))

[[0, 1, 3], [0, 2, 3]]

2.2.2 DFS#

DFS uses path to store the nodes in current recursion. We can add a terminating case, where the targeted end node is reached.

def allPaths(graph, source, target):
    # initialize a list to store all the paths
    paths =[]
    # initialize a list to store the current path
    path = []

    def dfs(graph, source, target):
        if source == target:
            # if the source is the target, add the current path to the list of paths
            paths.append(path + [source])
            return
        
        path.append(source)
        for neighbor in graph[source]:
            dfs(graph, neighbor, target)
        # remove the last node in the path after recursion to restore the path to the state before the function call
        path.pop()

    dfs(graph, source, target)

    return paths

# test
graph = [[1,2],[3],[3],[]]
print(allPaths(graph, 0, 3))

[[0, 1, 3], [0, 2, 3]]

2.3 Topological Sorting#

For a tree, level-order traversal or preorder traversal, or post-order traversal can be treated as a topological sort. inorder traversal is not.
- level-order traversal for a graph is not possible
- preorder traversal may not give topological sort order
  - for example, 0->(1, 2)， then 2->3）, the preorder traversal in a DFS is [0,1,3,2]
- postorder traversal can give the topological order
  - [1, 3, 2, 0]
  - reversed will be [0, 2, 3, 1]
Only DAG has topological sorting

Leetcode

0210

Therefore, we can simply add a list to record the postorder traversal of the graph, and reverse the list to get the topological order.

check if the graph is DAG
if it is DAG, then return a postorder traversal, and reverse the list to get the topological order
if it is not DAG, then not possible to get a topological order

2.3.1 DFS#

# The following code assumes the graph has no cycle
def topological_sorting(graph):
    # initialize a list to store the topological order
    order = []
    # initialize a set to store the visited nodes
    visited = set()

    def dfs(graph, source):
        # if the node is already visited, then return
        if source in visited:
            return
        # mark the node as visited
        visited.add(source)

        # explore all the neighbors of the node
        for neighbor in graph[source]:
            dfs(graph, neighbor)
        # postorder: add the node to the topological order after recursion to restore the order to the state before the function call
        order.append(source)

    # loop through all nodes in the graph in case the graph is not connected
    for i in graph.keys():
        dfs(graph, i)
    
    # reverse the order to get the topological order
    order.reverse()
    return order

# test
graph = {0: [1, 2], 1: [3], 2: [3], 3: []}
print(topological_sorting(graph))

[0, 2, 1, 3]

2.3.2 BFS#

# topological sorting with BFS
def topological_sorting(graph):
    # calculate the indegree of all nodes
    indegree = {}
    for node in graph.keys():
        indegree[node] = 0
    for node in graph.keys():
        for edge in graph[node]:
            indegree[edge] += 1
    
    # initialize a queue to store the nodes with indegree 0
    queue = []
    for node in indegree.keys():
        if indegree[node] == 0:
            queue.append(node)

    # initialize a list to store the topological order
    order = []

    # loop until the queue is empty
    while queue:
        # pop the first node in queue
        node = queue.pop(0)
        # add the node to the topological order
        order.append(node)
        # explore all the neighbors of the node
        for neighbor in graph[node]:
            indegree[neighbor] -= 1
            if indegree[neighbor] == 0:
                queue.append(neighbor)

    # if there is a cycle, then return None
    if len(order) != len(graph.keys()):
        return None
    
    return order

# test
graph = {0: [1, 2], 1: [3], 2: [3], 3: []}
print(topological_sorting(graph))

[0, 1, 2, 3]

3 Shortest Path Problems#

Shortest path is to find the shortest path between two vertex:

single source shortest path: find the shortest paths from a given source vertex s to each vertex t.
single destination shortest path: find the shortest paths from each source vertex s to the destination vertex t. We can reverse the direction of each edge, and formulate a single-source shortest path problem.
single pair shortest path: find the shortest path for a given pair (s, v). This problem usually requires to solve single source shortest path first.
all pairs shortest paths: find a shortest path from s to t for every pair of vertices s and t. Although we can solve this problem by running a single source algorithm once from each vertex, we usually can solve it faster using XXX.

BFS is typically used for shortest paths problems.

DFS is typically used for reacheability problems. When used for shortest path problems, DFS need go over all possible paths from given source and target, and then find the shortest one.

3.1 BFS#

def shortestPaths(graph, source):
    # initialize shortest paths
    paths = []
    # initialize a set to store the visited nodes
    visited = set()
    # initialize path
    path = [source]
    # initialize a queue to store the paths
    queue = [path]
    # loop until the queue is empty
    while queue:
        # pop the first path in the queue
        path = queue.pop(0)
        # get the last node in the path
        node = path[-1]
        # save the shortest path between the source and the node
        paths.append(path)
        # explore all the neighbors of the node
        for neighbor in graph[node]:
            # if the neighbor is not visited, add it to the visited and the path to the queue
            if neighbor not in visited:
                visited.add(neighbor)
                queue.append(path + [neighbor])
    return paths

# test
graph = [[1,2],[3],[3],[]]
print(shortestPaths(graph, 0))

graph = [[1,2,3],[3],[3],[]]
print(shortestPaths(graph, 0))

[[0], [0, 1], [0, 2], [0, 1, 3]]
[[0], [0, 1], [0, 2], [0, 3]]

3.2 DFS#

we can also use DFS, but this might not be efficient:

use DFS to find all reachable paths from given pair
choose the path with minimum weights

4 Shortest Weighted Path#

Dijkstra algorithm is usually used for detecting the shortest path in a DAG with nonnegative weights. Dijkstra algorithm extends BFS for weighted DAG by introducing a DP table to store the distance between vertices.

initialization
- a distance table for recording the best distance found so far, and set to infinity
- base case: the distance from the source vertex s to itself is set to 0
- a priority queue to maintain the to-be-explored neighbor vertices, sorted based on their current distance to the given source s.
main loop:
- pop out the queue, and get the current vertex and its current distance
- explore its neighbors:
  - for each neighbor vertex, calculate its current distance from the source s
  - if the current distance < the best distance found so far in the distance table:
    - update the distance table
    - enqueue
return the distance table
Note: usually no need to use visited set to track which node should be in the queue, because the distance table can be used

How to reconstruct the shortest paths based on distance table?

Leetcode

# This code is generated by Copilot.
def Dijkstra(graph, weights, source):
    # initialize a set to store the visited nodes
    visited = set()
    # initialize a dictionary to store the shortest distance from the source to each node
    distance = {}
    # initialize a dictionary to store the previous node of each node
    previous = {}
    # initialize the distance of the source to itself as 0
    distance[source] = 0
    # initialize the distance of all other nodes to infinity
    for node in graph.keys():
        if node != source:
            distance[node] = float('inf')
    # initialize the previous node of the source as None
    previous[source] = None
    # initialize a queue to store the nodes
    queue = [source]
    # loop until the queue is empty
    while queue:
        # pop the first node in the queue
        node = queue.pop(0)
        visited.add(node)
        # explore all the neighbors of the node
        for neighbor in graph[node]:
            # if the neighbor is not visited, add it to the queue
            if neighbor not in visited:
                queue.append(neighbor)
            # update the distance and previous node of the neighbor
            if distance[node] + weights[(node, neighbor)] < distance[neighbor]:
                distance[neighbor] = distance[node] + weights[(node, neighbor)]
                previous[neighbor] = node
    return distance, previous

# test
graph = {0: [1, 2, 3], 1: [3], 2: [3], 3: []}
weights = {(0, 1): 1, (0, 2): 2, (0, 3): 3, (1, 3): 4, (2, 3): 5}
print(Dijkstra(graph, weights, 0))

graph = {0: [2,3], 1: [2,3], 2: [0,1,3], 3: [0,1,2]}
weights = {(0, 2): 1, (0, 3): 1, (1, 2): 1, (1, 3): 1, (2, 0): 1, (2, 1): 1, (2, 3): 1, (3, 0): 1, (3, 1): 1, (3, 2): 1}
print(Dijkstra(graph, weights, 0))

({0: 0, 1: 1, 2: 2, 3: 3}, {0: None, 1: 0, 2: 0, 3: 0})
({0: 0, 1: 2, 2: 1, 3: 1}, {0: None, 2: 0, 3: 0, 1: 2})

# This version outputs the shortest path from the source to each node

def Dijkstra(graph, weights, source):

    # initialize a dictionary to store the shortest distance from the source to each node
    distance = {}
    # initialize a dictionary to store the previous node of each node
    previous = {}
    # initialize a dictionary to store the shortest path from the source to each node
    paths = {}

    # initialize the distance of all other nodes to infinity
    distance = {node: float('inf') for node in graph.keys()}
    # initialize the distance of the source to itself as 0
    distance[source] = 0

    # initialize the previous node of the source as None
    previous[source] = None
    
    # initialize the path of the source to itself as [source]
    paths[source] = [source]
    # initialize a queue to store the nodes
    queue = [(distance[source], paths[source])]

    # loop until the queue is empty
    while queue:
        # pop the path and its last node in the queue
        min_dist, path = queue.pop(0)
        node = path[-1]

        # this is very important, because the node may have been visited before
        # if the distance of the node is not the shortest, skip it
        if min_dist > distance[node]:
            continue

        # explore all the neighbors of the node
        for neighbor in graph[node]:
            # update the distance and previous node of the neighbor
            dist_neighbor = distance[node] + weights[(node, neighbor)]
            if dist_neighbor < distance[neighbor]:
                distance[neighbor] = dist_neighbor
                previous[neighbor] = node
                paths[neighbor] = path + [neighbor]
                queue.append((dist_neighbor, path + [neighbor]))

    return distance, previous, paths

# test
graph = {0: [1, 2, 3], 1: [3], 2: [3], 3: []}
weights = {(0, 1): 1, (0, 2): 2, (0, 3): 3, (1, 3): 4, (2, 3): 5}
print(Dijkstra(graph, weights, 0))

graph = {0: [2,3], 1: [2,3], 2: [0,1,3], 3: [0,1,2]}
weights = {(0, 2): 1, (0, 3): 1, (1, 2): 1, (1, 3): 1, (2, 0): 1, (2, 1): 1, (2, 3): 1, (3, 0): 1, (3, 1): 1, (3, 2): 1}
print(Dijkstra(graph, weights, 0))

({0: 0, 1: 1, 2: 2, 3: 3}, {0: None, 1: 0, 2: 0, 3: 0}, {0: [0], 1: [0, 1], 2: [0, 2], 3: [0, 3]})
({0: 0, 1: 2, 2: 1, 3: 1}, {0: None, 2: 0, 3: 0, 1: 2}, {0: [0], 2: [0, 2], 3: [0, 3], 1: [0, 2, 1]})

5 Bipartition Graph#

Check if a graph is bipartition graph. We just need traverse the graph, and color the nodes with two colors. If any two adjacent nodes in the resulting graph have different colors, then the graph is bipartitional.

Algorithm

BFS
- maintain a queue that stores nodes at the same breadth yet not visited
- pop out the nodes one by one
- examine its neighbors
  - if the neighbor is not visited, color them with a different color than the node itself
  - if the neighbor is visited, and has the same color as its parent, then not a valid bipartition graph
DFS
- traverse all the neighbors of the node
- if the neighbor is not visited, color them with a different color than the node itself
- if the node is already visited, and has the same color as its parent, then not a valid bipartition graph

Leetcode

0886
0785

5.1 BFS#

def isValidBipartitionGraph(graph):
    # initialize a set to store the visited nodes
    visited = set()
    # initialize a dictionary to store the colors of the nodes
    colors = {}

    def bfs(graph, source):
        # if the source is visited, then the graph is bipartite
        if source in visited:
            return True

        # initialize a queue to store the nodes
        queue = [source]
        # initialize the color of the source
        colors[source] = 0
        # loop until the queue is empty
        while queue:
            # pop the first node in the queue
            node = queue.pop(0)
            # add the node to the visited set
            visited.add(node)
            # explore all the neighbors of the node
            for neighbor in graph[node]:
                # if the neighbor is not visited, add it to the queue
                if neighbor not in visited:
                    queue.append(neighbor)
                    # color the neighbor with the opposite color of the node
                    colors[neighbor] = 1 - colors[node]
                # if the neighbor is visited and has the same color as the node, then the graph is not bipartite
                elif colors[neighbor] == colors[node]:
                    return False
        return True

    # loop through all nodes in the graph in case the graph is not connected
    for i in graph.keys():
        if not bfs(graph, i):
            return False
            
    # otherwise, the graph is bipartite
    return True

# test
graph = {0: [1, 3], 1: [0, 2], 2: [1, 3], 3: [0, 2]}
print(isValidBipartitionGraph(graph))

graph = {0: [1, 2, 3], 1: [0, 2], 2: [0, 1, 3], 3: [0, 2]}
print(isValidBipartitionGraph(graph))

True
False

5.2 DFS#

def isValidBipartitionGraph(graph):
    # initialize a set to store the visited nodes
    visited = set()
    # initialize a dictionary to store the colors of the nodes
    colors = {}

    # define a recursive function to explore the graph
    def dfs(graph, source, color):
        # if the source is visited, then the graph is bipartite
        if source in visited:
            return True
        # otherwise add the source to the visited set
        visited.add(source)
        # color the source
        colors[source] = color
        # explore all the neighbors of the source
        for v in graph[source]:
            # if the neighbor is not visited, color it with the opposite color of the source
            if v not in visited: 
                dfs(graph, v, 1 - color)
            elif colors[v] == color:
                return False
        return True

    for i in graph.keys():
        color = 0 if i not in colors else colors[i]
        if not dfs(graph, i, color):
            return False
    return True

graph = {0: [1, 3], 1: [0, 2], 2: [1, 3], 3: [0, 2]}
print(isValidBipartitionGraph(graph))

graph = {0: [1, 2, 3], 1: [0, 2], 2: [0, 1, 3], 3: [0, 2]}
print(isValidBipartitionGraph(graph))

True
False

6 Union Find#

see here.

Leetcode:

0323

class UnionFind():
    def __init__(self, n):
        # initialize the parent of each node to itself
        self.parent = [i for i in range(n)]
        # initialize the size of each set to 1
        self.size = [1 for i in range(n)]
        # initialize the number of disjoint sets to n
        self.num_disjoint_sets = n
    
    def find(self, node):
        # find the root of the node by traversing the parent
        # add path compression to make sure find() is O(1)
        # this recursion maintains the height of the tree to be 2 (root<-(node1, node2, ... node n)
        if self.parent[node] != node:
            # path compression: recusively set the parent of the node to the root
            self.parent[node] = self.find(self.parent[node])
        return self.parent[node]
    
    def union(self, node1 , node2):
        # find the root of each node
        root1 = self.find(node1)
        root2 = self.find(node2)
        
        # if the two nodes are already in the same set, do nothing
        if root1 == root2:
            return
        
        # otherwise, merge the two sets
        # here we dont merge the smaller set to the larger set for balancing purpose
        # because we have compression path in find(), the height of the tree is always 2
        self.parent[root1] = root2
        self.size[root2] += self.size[root1]

        # decrease the number of disjoint sets by 1
        self.num_disjoint_sets -= 1
    
    def connected(self, node1, node2):
        # check if two nodes are in the same set
        return self.find(node1) == self.find(node2)
    
    def count(self):
        # 
        return self.num_disjoint_sets

7 Minimum Spanning Tree#

Suppose we wish to connect all the computers in a new office building using the least amount of cable.
We can model this problem using an undirected, weighted graph G whose vertices represent the computers, and whose edges represent all the possible pairs (u,v) of computers, where the weight w(u,v) of edge(u,v) is equal to the amount of cable needed to connect computer u to computer v. Rather than computing a shortest-path tree from some particular vertex v, we are interested instead in finding a tree T that contains all the vertices of G and has the minimum total weight over all such trees. Algorithms for finding such a tree are the focus of this section.

A tree, such as this, that contains every vertex of a connected graph G is said to be a spanning tree, and the problem of computing a spanning tree T with smallest total weight is known as the minimum spanning tree(MST) problem.

Two algorithms are popular to solve this problem: Prim’s algorithm and Kruskal’s algorithm. These two algorithms are both greedy algorithms, as at each step of iteration, they choose the object that is the best at the moment. Such a strategy doesn’t always ganrantee that it will always find globally optimal solutions to problems. However, for a MST problem, we can prove that certain greedy strategies do yield a spanning tree with minimum weight.

Leetcode

7.1 Prim’s Algorithm#

Prim’s algorithm operates similarly as Dijkstra’s algorithm.

def Prim(G, r)
    # Initialize a distance table and parent table for all vertice in G
    for each u in G.V
        u.d = infinity
        u.p = None
    # Initialize distance for root node
    r.d = 0

    # Initialize a priority queue for all vertice based on distance
    Q = PriorityQueue(G.V)

    # 
    while Q:
        # Extract the vertex with the minimum distance in Q
        u = Q.extract_min()
        # Explore its neighbor
        for v in G.adj[u]:
            # If the neighbor has not finished and the weight of edge (u,v) is smaller than its current key
            if v in Q and G.W[(u,v)] < v.d:
                # Track its parent
                v.p = u
                # Reset distance key using smaller weight
                v.d = G.W[(u,v)]

A naive implementation could be:

import heapq
def Prim(graph, weight):
    nodes = graph.keys()
    n = len(nodes)    
    # initialize a distanace table and parent table for MST
    distance = [float('inf') for node in nodes]
    source = list(nodes)[0]
    distance[source] = 0
    parent = [None for node in nodes]
    
    # initialize a MST
    MST = []

    # initialize a set to store the visited nodes
    visited = set()

    # initialize a priority queue to store the nodes
    pq = [(distance[source], source)]

    # loop until the queue is empty
    while pq:
        print("==================================")
        print(pq)
        # pop the node with the smallest distance
        dist, node = heapq.heappop(pq)
        # if the node is visited, then skip
        if node in visited:
            continue
        # otherwise, add the node to the visited set
        visited.add(node)
        # add the node to the MST
        if parent[node] is not None:
            MST.append((parent[node], node))
        # explore all the neighbors of the node
        for neighbor in graph[node]:
            # if the neighbor is not visited and the weight of the edge is smaller than the current distance, update the distance and parent
            if neighbor not in visited and weight[(node, neighbor)] < distance[neighbor]:
                distance[neighbor] = weight[(node, neighbor)]
                parent[neighbor] = node
                heapq.heappush(pq, (distance[neighbor], neighbor))

    return MST

graph = {0: [1, 2, 3], 1: [0, 2], 2: [0, 1, 3], 3: [0, 2]}
weight = {(0, 1): 1, (0, 2): 4, (0, 3): 3, (1, 2): 2, (2, 3): 5}
print("MST is:", Prim(graph, weight))

==================================
[(0, 0)]
==================================
[(1, 1), (4, 2), (3, 3)]
==================================
[(2, 2), (4, 2), (3, 3)]
==================================
[(3, 3), (4, 2)]
==================================
[(4, 2)]
MST is: [(0, 1), (1, 2), (0, 3)]

7.2 Kruskal’s Algorithm#

Kruskal’s algorithm is based on union find algorithm.

While the Prim's algorithm builds the MST by growing a single treeuntil it spans the graph, Kruskal’s algorithm maintains a forest of clusters, repeatedly merging pairs of clusters until a single cluster spans the graph. Initially, each vertex is by itself in a singleton cluster.
The algorithm then considers each edge in turn, ordered by increasing weight. If an edge e connects two different clusters, then e is added to the set of edges of the minimum spanning tree, and the two clusters connected by e are merged into a single cluster. If, on the other hand, e connects two vertices that are already in the same cluster, then e is discarded. Once the algorithm has added enough edges to form a spanning tree, it terminates and outputs this tree as the minimum spanning tree.

def Kruskal(G):
    # Initialize a forest - more precisely, it should be a forest
    T = []

    # Initialize a union-find disjoints for the vertices in the graph
    UnionFind(G.V):

    # Put all edges in `G.E` to a min priority queue sorted by the weight `G.W[e]` for each edge `e (u, v)`
    Q = PriorityQueue((G.W[e], e))
    
    # Traverse each edge in the priority queue
    for (we, e)  in Q:
        # Unzip to get the two nodes from an edge
        u, v = e
        
        # If two nodes are not connected in the union-find disjoints
        # Then connect them and add the edge to the tree
        if not UnionFind.connected(u, v):
            UnionFind.connect(u, v)
            A.append(e)
    
    # return a MST
    return A

8 Connectivity#

Tarjan’s algorithm for:
- articulation points -> critical vertex
- bridhes in a graph -> critical edge
- strongly connected components

def buildGraph(n_nodes, connections):
    graph = {}
    for node in range(n_nodes):
        graph[node] = []
    
    for node1, node2 in connections:
        graph[node1].append(node2)
        # if build an undirected graph, add the following line
        graph[node2].append(node1)
    return graph

# Tarjan algorithm for finding critical edges in a graph
def criticalConnections(n, connections):
    # build the graph
    graph = buildGraph(n, connections)
    # initialize a list to store the critical edges
    critical_edges = []
    # initialize a list to store the discovery time of each node
    discovery_time = [-1] * n
    # initialize a list to store the lowest discovery time of each node
    lowest_discovery_time = [-1] * n
    # initialize a set to store the visited nodes
    visited = set()
    # initialize a variable to store the current time
    time = 0

    def dfs(graph, node, parent):
        nonlocal time
        # if the node is visited, then return
        if node in visited:
            return
        # mark the node as visited
        visited.add(node)
        # set the discovery time and lowest discovery time of the node
        discovery_time[node] = time
        lowest_discovery_time[node] = time
        time += 1
        # explore all the neighbors of the node
        for neighbor in graph[node]:
            # if the neighbor is the parent, then skip
            if neighbor == parent:
                continue
            # if the neighbor is not visited, explore it
            if neighbor not in visited:
                dfs(graph, neighbor, node)
            # update the lowest discovery time of the node
            lowest_discovery_time[node] = min(lowest_discovery_time[node], lowest_discovery_time[neighbor])
            # if the lowest discovery time of the neighbor is greater than the discovery time of the node, then the edge is a critical edge
            if lowest_discovery_time[neighbor] > discovery_time[node]:
                critical_edges.append([node, neighbor])

    # loop through all nodes in the graph in case the graph is not connected
    for i in graph.keys():
        dfs(graph, i, None)

    return critical_edges

# test
n = 4
connections = [[0, 1], [1, 2], [2, 0], [1, 3]]
print(criticalConnections(n, connections))

[[1, 3]]

Chapter 8. Graph

Contents

Chapter 8. Graph#

1. Build#

2. Traversal#

2.1. Valid DAG#

2.1.1. BFS#

2.1.2. Depth-first Search#

2.2 Single Pair All Paths Problem#

2.2.1 BFS#

2.2.2 DFS#

2.3 Topological Sorting#

2.3.1 DFS#

2.3.2 BFS#

3 Shortest Path Problems#

3.1 BFS#

3.2 DFS#

4 Shortest Weighted Path#

5 Bipartition Graph#

5.1 BFS#

5.2 DFS#

6 Union Find#

7 Minimum Spanning Tree#

7.1 Prim’s Algorithm#

7.2 Kruskal’s Algorithm#

8 Connectivity#