Graphs Theory
In this lecture we are going to see graphs, how we represent graphs, and some useful algorithms for graphs.
Learning Outcomes
- Recognise problems that could be solved using maximum flow algorithms
- Recognise problems that could be solved using minimum spanning tree algorithms
- Recognise problems that could be solved using topological sorting algorithms
- Recognise problems that could be solved using graph search algorithms
Introduction
Overview
Representation
When working with a graph mathematically, we want a formal representation
- Allows us to express properties about our graph using the tools of Set Theory, Logic, etc.
- Allows generalisation over possible graphs
- Allows us to describe algorithms on graphs
Graphs are easily understood graphically. However, to formalise them we need to be able to treat any graph as a string of mathematical symbols. We need to select the right data structure to use to encode our graph. Different representations can express different sorts of graph.
Functions
When working with any new mathematical concept, it can be helpful to represent it in terms of simpler mathematical concepts. For example, we can represent functions in set theory.
A function can be understood as a set of relationships between inputs and outputs. For instance the 'add one' function, or Successor function, on natural numbers relates the pairs of numbers \( \{(0, 1), (1, 2), (2, 3), \dots, (n, n+1) \} \), because by adding one to the first number, we get the second. The pair \( (1, 3) \) will not be in this set. Thus the 'add one' function can be thought of as just a particular subset of \( \mathbb{N} \times \mathbb{N} \).
Definitions
A graph is a simple mathematical concept that can be applied to solving many and varied problems. A graph consists of a set of vertices (or nodes) joined by edges. Nodes and edges can be used to represent diverse phenomena. Nodes represent things, and edges represent relationships between these things. Modelling a system as a graph can make it easier to reason about. There are also many algorithms that have been devised for graphs, such as for finding the shortest path between two points on a graph.
Here is a simple graph. There are three nodes (labelled \( a \), \( b \), and \( c \)). These three nodes are joined by three edges, one of which is a loop on \( a \). These edges are represented as having direction, thus we would call this a directed graph, or digraph. It is also possible to have undirected graphs.
There is lots of terminology for graphs that identify particular things, such as:
- Walks
- Trails
- Paths
- Circuits
- Pseudograph
- Bouquet
- ...
The terminology for describing graphs varies. For this course use the terminology and notation introduced here.
Simple Graph
Graphs are formed of vertices (or nodes) and edges.
A simple graph contains at most one undirected edge between any pair of nodes and does not allow edges to form loop (start and end at the same vertex). This allows us to represent a graph as the pair \( (V, E) \), where \( V \) is a set of vertices, and \( E \) is a set of edges. Each edge in \( E \) is a set \( \{ v_i, v_j \} \), where \( v_i \) and \( v_j \) are vertices in \( V \) and \( v_i \not = v_j \)
An extensional definition of a simple graph using set-theoretic notation might look as follows:
Observe that it is impossible to represent loops using this representation, due to the requirement that in an edge \( \{v_i, v_j\} \), \( v_i \not = v_j \). Were we to relax this requirement, it would be possible to represent loops as a singleton set \( \{v_i\} \), where \( v_i \) is the node containing the edge.
Directed and Undirected Edges
A graph is directed (called a digraph) if the relationships between nodes are directed, or one way. Digraphs are commonly used to represent flow, e.g. of water through pipes.
A small change in representation allows us to represent a directed graph. Previously we represented edges as sets \( \{v_i, v_j\} \). Sets are unordered thus the edge they represent cannot be directed. By choosing instead to represent an edge as a pair (i.e \( (v_i, v_j) \)), we would be able to consider the edge to be directed.
Degree
The degree of a node is the number of edges incident to it (number of edge connections). In directed graphs this can be further divided into in-degree and out-degree. In-degree is the number of inward-directed edges; out-degree is the number of outward-directed edges.
In a directed graph, a source is a node with in-degree \( =0 \); a sink is a node with out-degree = 0; and transfer nodes are those with in-degree \( \not =0 \) and out-degree \( \not =0 \).
Min Degree
The min degree of a graph $\delta(G)$ is the smallest degree of any node
Max Degree
The max degree of a graph $\Delta(G)$ equals the largest degree of any node
Paths
A path is a sequence of immediately connected nodes (you can also represent paths as a series of edges where each pair of edges shares a common node). The length of a path is the number of edges it contains. If the path follows directed edges, it is called a directed path.
There are special types of path. For instance, a cycle is a path that starts and ends with the same node. If otherwise there are no repeated vertices, it is called a simple cycle. Graphs containing one or more cycles are called cyclic. If a graph does not contain any cycles, it is called acyclic
The length of a path (or cycle) is the number of edges
A path (or cycle) is simple if it has no repeated vertices
Except for first and last in the case of a cycle
A simple cycle:
a, b, c, aA cycle that is not simple:
a, b, c, a, b, c, aA sequence of vertices connected by directed edges is a directed path
A directed path:
a b cA path that is not directed
c b aConnectedness
Two nodes are immediately connected if there is an edge joining them. Two nodes are connected if there is a path joining them. If this path is directed, the two nodes are strongly connected. If every node in a graph is connected to every other node, the graph is said to be connected.
Component
A component is a maximal set of connected nodes (i.e. where every node in the set is connected to every other node). A strongly connected component is a maximal set of strongly connected nodes. In the above graph, \( a \) and $b$ together form a strongly connected component. \( c \) by itself forms a second component. The graph formed by replacing each component with a single node (but retaining connections between components) is called a condensation graph. This is most useful for representing the relationships between strongly connected components.
Multigraph
A limitation that both of these representations have is that they cannot represent parallel edges. Graphs that contain two edges both joining the same two nodes (and in directed graphs, where both go in the same direction) are called multigraphs.
Take a moment to consider what you know about set theory and think why we cannot represent such parallel edges in the above representations.
You may have noticed that above we are representing edges as a set. Sets do not contain repetition, thus were we to attempt to represent a set containing parallel edges, e.g. \( \{ (v_1, v_2), (v_1, v_2) \} \), this would be equivalent to the set with only a single edge \( \{ (v_1, v_2) \} \). The conventional solution to this is to change our graph representation as follows.
Let a directed multi-graph be defined as the tuple \( (V, E, \Phi) \), where \( V \) is a set of vertices \( \{ v_1, v_2, \dots, v_n \} \), $E$ is a set of edges \( \{e_1, e_2, \dots, e_n \} \), and \( \Phi \) is a function of type \( E \rightarrow V \times V \) (meaning that it maps edges to pairs of vertices, or it takes an edge and returns a pair of vertices). An extensional definition of the same graph as above is as follows:
Adjacency Matrix
Graph Algorithms
Minimum Spanning Tree
A spanning tree is a subset of a graph's edges that connect every vertex without including any cycles.
Imagine you were planning how to lay fibre-optic cables to connect houses in a neighbourhood. Utilities (pipes/cables, etc.) are laid under roads to avoid having to dig up people's houses and gardens. However, the road network is likely to contain many cycles. There is no need to lay cable under every road as this would be wasteful. Instead, you only need to lay cables under a subset of the roads to connect every house. A possible solution to this problem will be a spanning tree.
There are often many spanning trees for a given cyclic graph. A minimum spanning tree is a spanning tree which minimises the cost of edges in the tree.
In our cable-laying example, note that some roads are longer than others. When deciding how to lay the cable you would want to pick the shortest (lowest cost) roads that still allow you to connect every house. Here each edge in our graph can be assigned a weight. We can then use Prim's algorithm to find a minimum spanning tree for this graph (note there may be multiple minimum spanning trees).
Topological Sorting
Graphs can be used to express dependencies between items. For example, you might breaking down a task you need to perform into subtasks, each of which have dependencies. If you were baking a cake you might need to:
- Gather ingredients
- Preheat oven
- Mix ingredients
- Put cake into oven
There are dependencies between these. 3 has to follow 1. 4 has to follow 3. 2 Can be performed any time before 4. There are often multiple things you can do at once. You could gather ingredients while a friend pre-heats the oven, for instance. However, if there is only one of you, you will need to decide on an order for these tasks to happen in. This is a topological sorting.
We can think of baking a cake as a graph. Each node is one of the tasks. A (directed) edge $(a, b)$ is a dependency: task $a$ must proceed task $b$. When we decide on an order for these tasks, we need some ordering of the nodes of this graph such that none of the dependencies are violated. To do this we need an algorithm for finding valid topological orderings of a direct graph.
Other applications include managing software dependencies. If module A is required to build module B, and module B is required for to build module C, etc., what order do we need to include/install/compile modules? For example, if you have ever tried to compile Linux from scratch, you will have encountered this sort of chain of dependencies.
Breadth First Search
We often want to find a solution, or a path to a solution, in a graph.
Imagine you're creating a computer game. The enemies in this game need to be able to walk towards you. If there are walls in the way, you want them to intelligently know to walk around the walls. If they are in a maze, we want them to be able to intelligently path-find through the maze. They can do this by using graph-search algorithms such as Breadth-First Search (for pathfinding in a game you would usually use the more efficient, though related, algorithm A*)
Say you are writing a program to play chess. On its turn it needs to decide which move to make. After it decides this move, it will need to decide another, and another, until eventually the end of the game is reached. At each decision, possible paths branch out to form a tree. To find the best decision to take at any given point, you would make use of a search algorithm to search for a path to a game state in which it wins the game. For competitive games, we often use the Minimax search algorithm.
Breadth-First Search Pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
procedure BFS(G: Graph, v: Vertex of G) is
create a queue Q
enqueue v onto Q
mark v
while Q is not empty do
w Q.dequeue() ←
if w is what we are looking for then
return w
for all edges e in G.adjacentEdges(w) do
x G.adjacentVertex(w, e) ←
if x is not marked then
mark x
enqueue x onto Q
return null
Flow Network
A Maximum Flow problem is a problem that can be represented as finding the maximum possible flow through a network defined by edges that each have a defined maximum capacity. For example, imagine that the edges were pipes in a household plumbing system and each pipe could carry a certain volume of water. We could calculate the maximum flow of water from one end of the network (where it is attached to the mains) to the other end (where there is an open tap).
More generally, a flow network or transportation network is a graph where each edge has a capacity. The network has one or more sources and one or more sinks. These are places where flow enters and leaves the network. The flow in to a given node must match the flow out of that node, with the exception of source nodes (where the flow in is 0, and the flow out may be positive), and sink nodes (where the flow out is 0 and the flow in may be positive).
We can convert a flow network with multiple sources and sinks to a flow network with a single source and a single sink by creating an addition source node and an additional sink node and linking existing source/sink nodes to these with an edge with unlimited capacity.
Maximum Flow Problem
We may want to find the maximum flow entering and leaving each node / over each edge that satisfies the constraints given for a flow network above (the flow into and out of each node must be equal). To do this, we need to use a MaxFlow algorithm. The algorithm described here is the Edmonds-Karp algorithm for finding the Maximum Flow.
We can represent various kinds of problems as maximum flow problems. For example, timetabling, airline scheduling of flight crews, image segmenting, and more.
Edmonds-Karp Algorithm
Pseudocode from Wikipedia:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
algorithm EdmondsKarp is
input:
graph (graph[v] should be the list of edges coming out of vertex v in the
original graph and their corresponding constructed reverse edges
which are used for push-back flow.
Each edge should have a capacity 'cap', flow, source 's' and sink 't'
as parameters, as well as a pointer to the reverse edge 'rev'.)
s (Source vertex)
t (Sink vertex)
output:
flow (Value of maximum flow)
flow := 0 (Initialize flow to zero)
repeat
(Run a breadth-first search (bfs) to find the shortest s-t path.
We use 'pred' to store the edge taken to get to each vertex,
so we can recover the path afterwards)
q := queue()
q.push(s)
pred := array(graph.length)
while not empty(q) and pred[t] = null
cur := q.pop()
for Edge e in graph[cur] do
if pred[e.t] = null and e.t ≠ s and e.cap > e.flow then
pred[e.t] := e
q.push(e.t)
if not (pred[t] = null) then
(We found an augmenting path.
See how much flow we can send)
df := ∞
for (e := pred[t]; e ≠ null; e := pred[e.s]) do
df := min(df, e.cap - e.flow)
(And update edges by that amount)
for (e := pred[t]; e ≠ null; e := pred[e.s]) do
e.flow := e.flow + df
e.rev.flow := e.rev.flow - df
flow := flow + df
until pred[t] = null (i.e., until no augmenting path was found)
return flow