EECS 311: Graphs |
Home Course Info Links Grades |
Lectures Newsgroup Homework Exams |

There's a *lot* of terminology associated with graphs, much
of it similar to that used in trees. Trees are actually a special
case of graphs, but they're the most useful special case.

You should be familiar with all of the following:

- node or vertex
- edge
- endpoints of an edge
- directly versus indirectly connected nodes
- predecessors and successors
- loop or cycle
- weighted edges in networks
- directed versus undirected edges
- arcs (directed edges)
- directed graph or digraph
- directed acyclic graph or DAG
- paths
- path lengths
- path weights
- strongly versus weakly connected nodes
- indegree and outdegree, or fan-in and fan-out
- sinks and sources
- successors and predecessors
- shortest paths versus cheapest paths
- topological orderings

This is the simplest, most flexible method. Given a graph like this

you create a list of lists, where each list starts with a node, followed by that node's successors:

((a (b 1) (d 5)) (b (a 1) (c 1)) (c (b 3) (d 1)) (d (b 2)) )

To find a node, you search the list of lists sequentially, looking at the first element in each list, until you find the desired node.

Almost the same, but you have an array of pointers to lists of successors, indexed by nodes.

This is obviously faster to get to a particular node, but only works for node identifiers that come from a reasonable enumerable set, like characters. It wouldn't work well for cities in the United States.

This is also quite simple. Given N nodes, from an enumerable set,
just make an NxN matrix **Adj**. The value in **Adj[i, j]**
says whether **j** is an immediate successor of **i** or not.

- For unweighted graphs,
**Adj[i, j]**can just be true or false, or 1 or 0, e.g.,**Adj[i, i]**= 1, for every node**i****Adj[i, j]**= 1 if node**i**is directly connected to node**j**, 0 otherwise

- For weighted graphs,
**Adj[i, j]**is a numeric value, e.g.,**Adj[i, i]**= 0, for every node**i****Adj[i, j]**=*n*, for every node**i**connected by an edge with weight*n*to a node**j****Adj[i, j]**=**inf**(infinity), for every pair of nodes not directly connected

For our graph above, it would be

a | b | c | d | |
---|---|---|---|---|

a | 0 | 1 | - | 5 |

b | 1 | 0 | 1 | - |

c | - | 3 | 0 | 1 |

d | - | 2 | - | 0 |

Consider prerequisites for courses, such as those for computer engineering at Northwestern.

This is a **directed acyclic graph**.

One of the things you want to know is what order courses can be taken in. You couldn't, for example, take 374 before 202, because 374 requires 360, 360 requires 222, and 222 requires 202. But you could take 374 before or after 359.

A topological ordering (or sort or linearization)
is any sequence of the nodes (vertices) in the graph,
such that, if node_{2} follows node_{1} in some path in the graph,
then node_{2} must come after node_{1} in the topological ordering.
A graph may have many valid topological orderings. (A graph with cycles has none.)

Here's the standard algorithm for a topological sort, given a graph with N nodes:

- Create a table
`indegree[N]`

, where`indegree[i]`

represents how many nodes point directly to node_{i}. - Create a queue of nodes.
- Enqueue every node
_{i}whose`indegree[i]`

is 0. - While the queue is not empty
- remove the node X from the front of the queue
- decrement
`indegree[j]`

for every node_{j}that X points to - if
`indegree[j]`

is now 0, enqueue node_{j}

What's the complexity of this algorithm?

Given two nodes in a graph, a common problem is to find a path from one to the other.

This method finds a path, pretty cheaply, but may not find the shortest path. It uses a stack. The gist of the algorithm is this:

- Let Start and End be the two nodes.
- Push Start onto the stack.
- While the stack is not empty
- Pop the stack into Node.
- Get the successors, Succ, of Node.
- If End is in Succ, return success.
- Otherwise, push the successors onto the stack.

To make this algorithm useful, we need to keep track of the path we found. To do that, we actually need several stacks and the algorithm will look very much like our course scheduler algorithm, keeping track of all alternatives and what's been tried so far.

Also, if the graph has cycles or nodes that many other nodes link to, it will loop or at least be very inefficient.

This is easy to fix by keeping track of which nodes have been visited and never following the successors of any already visited node.

This method finds the shortest path, but it uses more space to do so. It uses a queue. The gist of the algorithm is this:

- Let Start and End be the two nodes.
- Enqueue Start onto the queue.
- While the queue is not empty
- Dequeue the stack into Node.
- Get the successors, Succ, of Node.
- If End is in Succ, return success.
- Otherwise, enqueue the successors onto the queue.

Again, to make this algorithm useful and practical, we need to keep track of the path and which nodes have been visited. You should be able to figure out why this algorithm will not loop endlessly if a path exists, even if there are cycles in the graph, even without tracking visited nodes. But to avoid endless loops in the "no path" case, and for efficiency, we should track visited nodes.

If we have an unweighted graph, then the shortest path is the cheapest one and we can use breadth-first search to find it.

If we have weighted edges, however, then there may be a path with many links that costs less than a path with fewer links.

Finding lowest cost paths is a core problem in many applications, from finding the cheapest or fastest airplane route to somewhere to finding the least loaded set of telephone connections between two points.

The obvious solution is to modify the depth-first traversal algorithm to:

- save each path it finds, but
- never stop until all paths have been tried
- pick the best of the paths found

This will try many paths. For example, for the network in Figure 9.15, it will try

- 1 - 0 - 4
- 1 - 0 - 3 - 4
- 1 - 0 - 2 - 3 - 4
- 1 - 3 - 4
- 1 - 3 - 0 - 4
- 1 - 3 - 2 - 0 - 4
- 1 - 2 - 0 - 4
- 1 - 2 - 3 - 4
- 1 - 2 - 3 - 0 - 4
- 1 - 2 - 0 - 3 - 4

and I may have missed some.

Obviously, many of these paths include the same subpaths over and over. If we kept track of those subpaths, we'd reduce our work significantly.

Dijsktra's algorithm incrementally grows the set of cheapest paths
from a starting node, **start**, breadth-first, until it reaches a
given destination node, **end**.

It uses the following data structures in addition to **Adj[i,
j]**:

**cost[***node***]**maps a*node*to the cost of its cheapest path from**start****unprocessed**is a priority queue of <*node*,*cost*> pairs, where*cost*is the cost from**start**to*node*via some path, sorted with cheapest cost on top

The algorithm looks like this:

unprocessed= { <start, 0> }cost[i]= empty Whileunprocessedis not empty Pop <i, cost> offunprocessed. Ifi=end, exit with <i, cost>. Ifcost[i]is unknowncost[i]=cost. For each nodejwith an edge toipush <j,cost[i] + Adj[i, j]> ontounprocessed

This is an O(N^{2}) algorithm, in that we have to process
O(N) nodes, and for each node, it takes O(N) steps to find **i**
and O(N) steps to update the costs for each node **j**.

Try running Dijkstra's algorithm on the graph given at the start of this page.

Note that this algorithm doesn't tell you the shortest path. What
would you add to the above to keep track of the best path? Hint: you
could augment what you store in **unprocessed** and **cost[]**.

Also note that we can avoid pushing <**j**, **cost**>
onto **unprocessed** in two cases:

**j**is not adjacent to**i****cost[j]**is already known

These don't change the O() complexity. The O(log n) check for
**cost[j]** might even be slower than the O(log n) time to add
<**j**, **cost**> to **unprocessed**. This is an
empirical issue.

Dijskstra's algorithm is fine for finding the cheapest path
between given pair of nodes. If we wanted to find all the cheapest
paths between every pair of nodes, there are O(N^{2}) pairs
of nodes, so we'd have an O(N^{4}) algorithm if we simply
called Dijskstra's algorithm on every pair.

Fortunately, there's a simpler cheaper way. If we know we're doing
all N^{2} pairs, i.e., we're making a 2-dimensional table **Cost** of
shortest paths, then we use Floyd's algorithm.

Cost[i, j] = Adj[i, j]for alli,jFor each nodekfrom 1 to N For each nodeifrom 1 to N For each nodejfrom 1 to Nconnectitojviakif that path is shorter than the current path fromitojCost[i, j] = min(Cost[i, j], Cost[i, k] + Cost[k, j])

That's it. The inner formula says that if the path **i - k -
j** is cheaper than the previously known path (which, because of
the way we're generating them, could not have included **k**),
make that the new cost of getting from **i** to **j**.

This is an O(N^{3}) algorithm, because of the three nested
O(N) loops. The correctness of this simple
algorithm is not immediately obvious. A good discussion
of why it works is
here.

For example, when we consider node **A**, we calculate costs
for paths through **A** for all nodes that we know can get to
**A**, say **B** and **C**. Then we do **B**, **C**,
and so on. Suppose we later find out that node **G** can get to
**A**. Won't we have missed a possible short path from **G** to
**B** or **C**? Read the proof.

Notice that this version of the algorithm doesn't actually create
the paths. Nowhere does it store what the paths are from **i** to
**k**, **k** to **j**, or **i** to **j**, just what
the cost of those paths are. To do so, we'd need to use a 2D
**Paths** array, and add the following line to the innermost loop
above:

Paths[i, j]=kifCost[i, k] + Cost [k, j] < Cost [i, j]

Try running Floyd's algorithm, with **Paths**, on the graph
given at the start of this page.

For unweighted graphs, Warshall's algorithm can be used to convert
an adjacency matrix into a connectivity matrix. That is, we start
with a matrix of 1's and 0's indicating what nodes are adjacent to
what, run the algorithm, and end up with a matrix of 1's and 0's
indicating what nodes are can be reached from what nodes. A 1 in
**[i, j]** means that there's some path from **i** to **j**.

Not surprisingly, the algorithm is very similar to Floyd's algorithm, with a simpler function for changing the array of connections.

Initialize:Conn[i, j] = Adj[i, j]for alli,jFor each nodekfrom 1 to N For each nodeifrom 1 to N For each nodejfrom 1 to NConnectiandjif both connect tokConn[i, j] = Conn[i, j] | ( Conn[i, k] & Conn[k, j] )

This is an O(N^{3}) algorithm, because of the three nested
O(N) loops.

A graph is not a tree, but we can remove edges until we have a tree that
still connects all vertices. If the edges of the graph have costs, a
**minimum spanning tree** (MST) is such a tree with the lowest cost.
MST's are useful because, being trees, they are much easier to work
with than graphs, but they often capture the critical information we
need from a graph.

Prim's algorithm for constructing an MST is similar to Dijkstra's algorithm. It starts from some node and adds edges in a greedy fashion (where greedy means picking the edge that leads to the least cost increase).

Initialize the tree with a lowest cost edge. While there are nodes not connected to the tree Add the lowest cost edge from a node in the tree to a node not in the tree.

Kruskal's algorithm is also a greedy algorithm, but it collects the lowest cost edges that don't cause cycles, rather than growing a single tree. The algorithm can be implemented using priority queues and union/find.

Let MST be an empty set of edges. Let PQ be a priority queue of all edges, with the lowest cost on top. Put each node in a separate set. While there is more than one set Take the top edge (u, v) from PQ. If find(u) != find(v), add the edge to MST union find(u) with find(v).

*Comments? Send mail to
* c-riesbeck@northwestern.edu.