0207 Course Schedule#

Problem#

There are a total of numCourses courses you have to take, labeled from 0 to numCourses - 1. You are given an array prerequisites where prerequisites[i] = [ai, bi] indicates that you must take course bi first if you want to take course ai.

For example, the pair [0, 1], indicates that to take course 0 you have to first take course 1. Return true if you can finish all courses. Otherwise, return false.

Examples#

Example 1:

Input: numCourses = 2, prerequisites = [[1,0]]
Output: true
Explanation: There are a total of 2 courses to take. 
To take course 1 you should have finished course 0. So it is possible.

Example 2:

Input: numCourses = 2, prerequisites = [[1,0],[0,1]]
Output: false
Explanation: There are a total of 2 courses to take. 
To take course 1 you should have finished course 0, and to take course 0 you should also have finished course 1. So it is impossible.

Constraints:#

1 <= numCourses <= 2000
0 <= prerequisites.length <= 5000
prerequisites[i].length == 2
0 <= ai, bi < numCourses
All the pairs prerequisites[i] are unique.

Follow-up#

Analysis#

  • when is it not possible to finish?

    • when there are loops in the dependency graph

  • how to detect if such a dependency graph has loops or not?

    • traverse the graph using DFS or BFS

    • mark visited nodes

    • if already visited, then there is a loop in the directed graph

What is a cycle in Directed Graph?#

DAG: there is a cycle along one direction. It is not possible to do a topological sorting if there is a cycle in the DAG.

[[1,0], [0,2], [2,1]]

Not DAG:

[[0,1], [0,2], [2,1]]

BFS Solution#

Solution 1:#

The following solution simply applies this logic:

  • build a graph from given dependency

  • use BFS to detect if there is a cycle

    • initialize a queue, and a visisted set to track which node has been visited

    • given node, travere its neighbor

This method, however, is not correct as its implementation.

  • the visited store all the nodes that has been visited.

  • the case [[0,1], [0,2], [2,1]], has a cycle from undirected graph point of view, but no cycle from directed graph point of view

  • when traversing the node 0, BFS will enqueue its dependency [1,2] and mark as visisted.

  • when going to the neighbor node 1, its neighbor node 2 is enqueued. But at previous step the node 2 has already been marked as visited, which will be identified INCORRETLY as a cycle.

The main reason is the visited set in this sense didn’t store parent/children information during traversal.

# the following code is only used for detecting cycles in a undirected graph
# for directed graph, this will not work as directed cycle is not the same as undirected cycle
def canFinish(numCourses, prerequisites):

    # build graph from given number of courses and prerequisites
    # here we represent graph as a dictionary
    def buildGraph(numCourses, prerequisites):
        graph = {}
        for i in range(numCourses):
            graph[i] = []
        for x, y in prerequisites:
            graph[x].append(y)
        return graph 
    
    # check if there is a cycle in the graph
    #visited = set()
    def hasCycle(graph, source):
        
        visited = set()
        queue = [source]

        while queue:
            node = queue.pop(0)
            # if the node is already visited, then there is a cycle
            if node in visited:
                return True
            
            # mark the node as visited
            visited.add(node)

            # explore all the neighbors of the node
            for neighbor in graph[node]:
                queue.append(neighbor)

        return False
    
    graph = buildGraph(numCourses, prerequisites)
    # add a for loop to check each node in case the graph is not connected
    for i in range(numCourses):
        if hasCycle(graph, i):
            return False 

    return True    

# test
# False
numCourses = 2
prerequisites = [[1,0], [0,1]]
print(canFinish(numCourses, prerequisites))

# True
numCourses = 3
prerequisites = [[0,1],[0,2],[2,1]]
print(canFinish(numCourses, prerequisites))

# False
numCourses = 3
prerequisites = [[1,0],[0,2],[2,1]]
print(canFinish(numCourses, prerequisites))

# True
numCourses = 5
prerequisites = [[1,4],[2,4],[3,1],[3,2]]
print(canFinish(numCourses, prerequisites))

# False
numCourses = 5
prerequisites = [[0,0],[1,4],[2,4],[3,1],[3,2]]
print(canFinish(numCourses, prerequisites))
False
False
False
False
False

A correct implementation is to use a indegree set to record the indegree of each node. Indegree is the number of dependencies a node requires, the number of incoming connections. When do a topological sorting, we need start from a node with 0 indegree, and such nodes are the starting point of a topological sort tree.

Algorithm:

  • step 0: get the indegree for each node in the graph

  • step 1: find nodes with indegree of 0, and enqueue to a queue

  • step 2: dequeue the queue,and decrease the indegree of the connected nodes from the popped node by 1

    • 可以想象成在graph中把该节点抹去,那么该节点指向的其他节点的indegree就减1了

  • repeat step 1 and 2

  • the traversal order is the topological sort if there is no cycle detected

  • if the number of nodes that has been popped out from the queue equals to the number of courses, then there is no cycle.

    • if there is a cycle, the number of nodes that have been popped out from the queue should be less than the toal number of courses.

def canFinish(numCourses, prerequisites):
    def buildGraph(numCourses, prerequisites):
        graph = {}
        indegree = {}
        for i in range(numCourses):
            graph[i] = []
            indegree[i] = 0 
        for t, f in prerequisites:
            graph[f].append(t)
            indegree[t] += 1

        return graph, indegree

    # bfs
    def bfs(graph, indegree):
        queue = []
        # find indegree 0 and enqueue
        for i in range(numCourses):
            if indegree[i] == 0:
                queue.append(i)
        j = 0
        while queue:
            node = queue.pop(0)
            # degree -1 for all the neighbors
            # if the neighbor's degree is 0, enqueue
            for neighbor in graph[node]:
                indegree[neighbor] -= 1
                if indegree[neighbor] == 0:
                    queue.append(neighbor)
            
            # record the number of nodes that have been visited
            j += 1
        
        # if j == numCourses, then there is no cycle
        # if there is a cycle, j is smaller than numCourses because the indegree of the nodes in a cycle is not 0, and they are not visited.
        return j == numCourses

    graph, indegree = buildGraph(numCourses, prerequisites)

    return bfs(graph, indegree)

# test
# False
numCourses = 2
prerequisites = [[1,0], [0,1]]
print(canFinish(numCourses, prerequisites))

# True
numCourses = 3
prerequisites = [[0,1],[0,2],[2,1]]
print(canFinish(numCourses, prerequisites))

# False
numCourses = 3
prerequisites = [[1,0],[0,2],[2,1]]
print(canFinish(numCourses, prerequisites))

# True
numCourses = 5
prerequisites = [[1,4],[2,4],[3,1],[3,2]]
print(canFinish(numCourses, prerequisites))

# False
numCourses = 5
prerequisites = [[0,0],[1,4],[2,4],[3,1],[3,2]]
print(canFinish(numCourses, prerequisites))
False
True
False
True
False

DFS Solution#

Simply check if a node is visited or not. If it is visited, then there is a loop in the graph.

follow up:

if we want to know the path of the loop,

  • we can use a parent dictionary to keep track of the parent of each node. Then we can use the parent dictionary to trace back the path of the loop.

  • add a path parameter to the dfs function, and append the current node to the path. If there is a loop, then the path will contain the loop.

notes

think about when and how to use visited or traced.

  • visited: the visited records all the nodes that have been visited, and explored. If during the recursion, the node marked as visited is visited again, there is no need to further do the calculation as it’s been done before.

    • or we can understand it as: visited is for the global graph point of view to make sure the traversal is not performed twice for the same node.

  • traced: the traced record all the nodes that are currently on the recursion path, which might not be visited/finished. - or we can understand it as: traced is the current recursion path, and can be used to see if current path has loop.

import collections
def canFinish(numCourses, prerequisites):

    def buildGraph(numCourses, prerequisites):
        # define graph as a dictionary with key as course and value as list of prerequisites
        graph = collections.defaultdict(list)
        for x, y in prerequisites:
            graph[x].append(y)
        return graph
    
    # store the visited nodes
    visited = {}
    traced = {}

    def hasCycle_dfs(graph, source):
        # if source is already in traced, then there is a cycle
        if source in traced:
            return True
            
        # if source is already visited, then there is no cycle
        if source in visited:
            return 

        # add source to visited
        visited[source] = True
        
        # add source to traced
        traced[source] = True 

        # explore all the neighbors of the source
        for neighbor in graph[source]:
            # if we dont find a cycle in the neighbor, then we continue
            if hasCycle_dfs(graph, neighbor):
                return True
            # if we replace the above line with the below line, then we will get the topological sort
            # hasCycle_dfs(graph, neighbor)
        
        # remove source from traced
        traced.pop(source)

        # otherwise there is no cycle
        return False

    graph = buildGraph(numCourses, prerequisites)
    for i in range(numCourses):
        hasCycle = hasCycle_dfs(graph, i)
        if hasCycle:
            return not hasCycle
   
    return True
    
# test
# False
numCourses = 2
prerequisites = [[1,0], [0,1]]
print(canFinish(numCourses, prerequisites))

# True
numCourses = 3
prerequisites = [[0,1],[0,2],[2,1]]
print(canFinish(numCourses, prerequisites))

# False
numCourses = 3
prerequisites = [[1,0],[0,2],[2,1]]
print(canFinish(numCourses, prerequisites))

# True
numCourses = 5
prerequisites = [[1,4],[2,4],[3,1],[3,2]]
print(canFinish(numCourses, prerequisites))

# False
numCourses = 5
prerequisites = [[0,0],[1,4],[2,4],[3,1],[3,2]]
print(canFinish(numCourses, prerequisites))
False
True
False
True
False
# implement as a traversal approach
def canFinish(numCourses, prerequisites):
    def buildGraph(numCourses, prerequisites):
        # define graph as a dictionary with key as course and value as list of prerequisites
        graph = collections.defaultdict(list)
        for x, y in prerequisites:
            graph[x].append(y)
        return graph
    
    # traverse the graph
    def traverse(graph, source, visited, traced):
        global hasCycle
        
        # if source is already in traced, then there is a cycle
        if source in traced:
            hasCycle = True
            return 
            
        # if source is already visited, then there is no cycle
        if source in visited:
            return 

        # add source to visited
        visited[source] = True
        
        # add source to traced
        traced[source] = True 

        # explore all the neighbors of the source
        for neighbor in graph[source]:
            # if we dont find a cycle in the neighbor, then we continue
            traverse(graph, neighbor, visited, traced)

        # remove source from traced
        traced.pop(source)

        # otherwise there is no cycle
        return False

    global hasCycle
    hasCycle = False
    visited = {}
    traced = {}

    graph = buildGraph(numCourses, prerequisites)
    for i in range(numCourses):
        traverse(graph, i, visited, traced)
        if hasCycle:
            return not hasCycle
    
    return True

# test
# False
numCourses = 2
prerequisites = [[1,0], [0,1]]
print(canFinish(numCourses, prerequisites))

# True
numCourses = 3
prerequisites = [[0,1],[0,2],[2,1]]
print(canFinish(numCourses, prerequisites))

# False
numCourses = 3
prerequisites = [[1,0],[0,2],[2,1]]
print(canFinish(numCourses, prerequisites))

# True
numCourses = 5
prerequisites = [[1,4],[2,4],[3,1],[3,2]]
print(canFinish(numCourses, prerequisites))

# False
numCourses = 5
prerequisites = [[0,0],[1,4],[2,4],[3,1],[3,2]]
print(canFinish(numCourses, prerequisites))
False
True
False
True
False

Why the following code is correct??

  • we removed the visited set, and only use traced set to check if there is a loop in the graph.

  • without using visited set, the code will still work, because the traced set will only contain the nodes that are currently on the recursion path.

  • however, if not using the visited set, the code will be slower, because the same node will be visited multiple times, and the same calculation will be performed multiple times.

import collections
def canFinish(numCourses, prerequisites):

    def buildGraph(numCourses, prerequisites):
        # define graph as a dictionary with key as course and value as list of prerequisites
        graph = collections.defaultdict(list)
        for x, y in prerequisites:
            graph[x].append(y)
        return graph
    
    # store the visited nodes
    traced = {}

    def hasCycle_dfs(graph, source):
        if source in traced:
            return True

        # add source to traced
        traced[source] = True 

        # explore all the neighbors of the source
        for neighbor in graph[source]:
            if hasCycle_dfs(graph, neighbor):
                return True
            # if we remove conditional statement, then we will get the topological sort
            # hasCycle_dfs(graph, neighbor)
        
        # remove source from traced
        traced.pop(source)

    graph = buildGraph(numCourses, prerequisites)
    for i in range(numCourses):
        if hasCycle_dfs(graph, i):
            return False
   
    return True
    
# test
# False
numCourses = 2
prerequisites = [[1,0], [0,1]]
print(canFinish(numCourses, prerequisites))

# True
numCourses = 3
prerequisites = [[0,1],[0,2],[2,1]]
print(canFinish(numCourses, prerequisites))

# False
numCourses = 3
prerequisites = [[1,0],[0,2],[2,1]]
print(canFinish(numCourses, prerequisites))

# True
numCourses = 5
prerequisites = [[1,4],[2,4],[3,1],[3,2]]
print(canFinish(numCourses, prerequisites))

# False
numCourses = 5
prerequisites = [[0,0],[1,4],[2,4],[3,1],[3,2]]
print(canFinish(numCourses, prerequisites))
False
True
False
True
False

If we also want to know the loop:

  • BFS

    • it’s hard to know the loop in BFS. the un-traversed nodes left by BFS is not the loop itself, but contains the loop, for example, 1<=>2->3. The loop is [1,2], but all 3 nodes are left after BFS.

    • we might use the outdegree information of the nodes after BFS

      • start from ndoes with outdegree = 0

      • remove such nodes, and outdegree - 1 for its neighbors

      • repeat

      • the nodes left are outdegree > 0 and indegree > 0, which formulates a cycle.

  • DFS

    • loop is in the traced or we can maintain a parent/children map. Once a cycle is detected, we can keep finding its parent to get the cycle.

import collections
def canFinish(numCourses, prerequisites):

    def buildGraph(numCourses, prerequisites):
        # define graph as a dictionary with key as course and value as list of prerequisites
        graph = collections.defaultdict(list)
        for x, y in prerequisites:
            graph[x].append(y)
        return graph
    
    def hasCycle_dfs(graph, source, traced):
        if source in traced:
            # find cycle, then return the cycle
            loops.append(traced[:]+[source])

            return
        
        # add source to visited
        traced.append(source)

        # explore all the neighbors of the source
        for neighbor in graph[source]:
            hasCycle_dfs(graph, neighbor, traced)
        
        # remove from traced
        traced.pop()

    graph = buildGraph(numCourses, prerequisites)
    loops = []
    for i in range(numCourses):
        traced = []
        hasCycle_dfs(graph, i, traced)
        
    return loops
    
# test
# False
numCourses = 2
prerequisites = [[1,0], [0,1]]
print(canFinish(numCourses, prerequisites))

# True
numCourses = 3
prerequisites = [[0,1],[0,2],[2,1]]
print(canFinish(numCourses, prerequisites))

# False
numCourses = 3
prerequisites = [[1,0],[0,2],[2,1]]
print(canFinish(numCourses, prerequisites))

# True
numCourses = 5
prerequisites = [[1,4],[2,4],[3,1],[3,2]]
print(canFinish(numCourses, prerequisites))

# False
numCourses = 5
prerequisites = [[0,0],[1,4],[2,4],[3,1],[3,2]]
print(canFinish(numCourses, prerequisites))
[[0, 1, 0], [1, 0, 1]]
[]
[[0, 2, 1, 0], [1, 0, 2, 1], [2, 1, 0, 2]]
[]
[[0, 0]]

Topological Sort

  • For a tree, level-order traversal or preorder traversal, or post-order traversal can be treated as a topological sort. inorder traversal is not.

    • level-order traversal for a graph is not possible

    • preorder traversal may not give topological sort order

      • for example, 0->(1, 2)->3, the preorder traversal in a DFS is [0,1,3,2]

    • postorder traversal can give the topological order

      • [3, 1, 2, 0]

      • reversed will be [0, 2, 1, 3]

  • Only DAG has topological sorting