Graphs Theory

In this lecture we are going to see graphs, how we represent graphs, and some useful algorithms for graphs.

Mathematics and Problem Solving

9. Graph Theory

David Gundry

Learning Outcomes

Apply graph theory diagrams and common graph terminology

Interpret and construct set-theoretic definitions of graphs

Recognise some common graph algorithms and their applications

Recognise problems that could be solved using maximum flow algorithms
Recognise problems that could be solved using minimum spanning tree algorithms
Recognise problems that could be solved using topological sorting algorithms
Recognise problems that could be solved using graph search algorithms

Learning Outcomes

Apply graph theory diagrams and common graph terminology
Interpret and construct set-theoretic definitions of graphs
Recognise some common graph algorithms and their applications

Introduction

Graphs

Seven Bridges of Konigsberg

Problem

Start at any point in Konigsberg, cross all bridges once and return to start

Overview

Representation in Set Theory
Defining Graphs
Graph Algorithms

Representation

When working with a graph mathematically, we want a formal representation

Allows us to express properties about our graph using the tools of Set Theory, Logic, etc.
Allows generalisation over possible graphs
Allows us to describe algorithms on graphs

Graphs are easily understood graphically. However, to formalise them we need to be able to treat any graph as a string of mathematical symbols. We need to select the right data structure to use to encode our graph. Different representations can express different sorts of graph.

Representation

Formalisation in set theory

Higher-level mathematical concepts can be formalised using lower-level concepts

Functions

When working with any new mathematical concept, it can be helpful to represent it in terms of simpler mathematical concepts. For example, we can represent functions in set theory.

A function can be understood as a set of relationships between inputs and outputs. For instance the 'add one' function, or Successor function, on natural numbers relates the pairs of numbers ${(0, 1), (1, 2), (2, 3), \dots, (n, n + 1)}$ , because by adding one to the first number, we get the second. The pair $(1, 3)$ will not be in this set. Thus the 'add one' function can be thought of as just a particular subset of $N \times N$ .

Functions

A function $A \to B$ is a set of pairs $(a, b)$ such that

$a \in A$ (domain)
$b \in B$ (codomain)

For example the successor function, $S : N \to N$ is the set of pairs

S = {(1, 2), (2, 3), (3, 4), \dots, (n, n + 1), . . .}

Definitions

A graph is a simple mathematical concept that can be applied to solving many and varied problems. A graph consists of a set of vertices (or nodes) joined by edges. Nodes and edges can be used to represent diverse phenomena. Nodes represent things, and edges represent relationships between these things. Modelling a system as a graph can make it easier to reason about. There are also many algorithms that have been devised for graphs, such as for finding the shortest path between two points on a graph.

Here is a simple graph. There are three nodes (labelled $a$ , $b$ , and $c$ ). These three nodes are joined by three edges, one of which is a loop on $a$ . These edges are represented as having direction, thus we would call this a directed graph, or digraph. It is also possible to have undirected graphs.

There is lots of terminology for graphs that identify particular things, such as:

Walks
Trails
Paths
Circuits
Pseudograph
Bouquet
...

The terminology for describing graphs varies. For this course use the terminology and notation introduced here.

Definitions

Simple Graph

Graphs are formed of vertices (or nodes) and edges.

A simple graph contains at most one undirected edge between any pair of nodes and does not allow edges to form loop (start and end at the same vertex). This allows us to represent a graph as the pair $(V, E)$ , where $V$ is a set of vertices, and $E$ is a set of edges. Each edge in $E$ is a set ${v_{i}, v_{j}}$ , where $v_{i}$ and $v_{j}$ are vertices in $V$ and $v_{i} \neq v_{j}$

An extensional definition of a simple graph using set-theoretic notation might look as follows:

({v_{1}, v_{2}, v_{3}}, {{v_{1}, v_{2}}, {v_{2}, v_{3}}, {v_{3}, v_{1}}})

Observe that it is impossible to represent loops using this representation, due to the requirement that in an edge ${v_{i}, v_{j}}$ , $v_{i} \neq v_{j}$ . Were we to relax this requirement, it would be possible to represent loops as a singleton set ${v_{i}}$ , where $v_{i}$ is the node containing the edge.

Simple Graphs

A simple graph is a pair

G = (V, E)

a set of vertices, $V$
a set of edges, $E$

V = {a, b} E = {e_{1}}

Directed and Undirected Edges

A graph is directed (called a digraph) if the relationships between nodes are directed, or one way. Digraphs are commonly used to represent flow, e.g. of water through pipes.

A small change in representation allows us to represent a directed graph. Previously we represented edges as sets ${v_{i}, v_{j}}$ . Sets are unordered thus the edge they represent cannot be directed. By choosing instead to represent an edge as a pair (i.e $(v_{i}, v_{j})$ ), we would be able to consider the edge to be directed.

({v_{1}, v_{2}, v_{3}}, {(v_{1}, v_{2}), (v_{2}, v_{3}), (v_{3}, v_{1})})

Edges

Vertices joined by an edge are immediately connected

Loops

An edge that starts and ends at the same vertex is a loop

Degree

The degree of a node is the number of edges incident to it (number of edge connections). In directed graphs this can be further divided into in-degree and out-degree. In-degree is the number of inward-directed edges; out-degree is the number of outward-directed edges.

In a directed graph, a source is a node with in-degree $= 0$ ; a sink is a node with out-degree = 0; and transfer nodes are those with in-degree $\neq 0$ and out-degree $\neq 0$ .

Min Degree

The min degree of a graph $δ (G)$ is the smallest degree of any node

Max Degree

The max degree of a graph $Δ (G)$ equals the largest degree of any node

Degree of a Vertex

Vertex $b$

degree $= 3$
in-degree $= 1$
out-degree $= 2$

Min degree $δ (G) = 1$

Max degree $Δ (G) = 3$

Paths

A path is a sequence of immediately connected nodes (you can also represent paths as a series of edges where each pair of edges shares a common node). The length of a path is the number of edges it contains. If the path follows directed edges, it is called a directed path.

There are special types of path. For instance, a cycle is a path that starts and ends with the same node. If otherwise there are no repeated vertices, it is called a simple cycle. Graphs containing one or more cycles are called cyclic. If a graph does not contain any cycles, it is called acyclic

The length of a path (or cycle) is the number of edges

A path (or cycle) is simple if it has no repeated vertices

Except for first and last in the case of a cycle

A simple cycle:

a, b, c, a

A cycle that is not simple:

a, b, c, a, b, c, a

A sequence of vertices connected by directed edges is a directed path

A directed path:

a b c

A path that is not directed

c b a

Paths

A path is a sequence

N_{a}, \dots, N_{z}

Where $N_{i}$ is immediately connected to $N_{i + 1}$ , for all $a \leq i < z$

If a path has no nodes $N_{i} = N_{j}$ where $i \neq j$ it is called simple

Cycle

A cycle is a path

$N_{a}, N_{b}, \dots, N_{z}$

where $N_{a} = N_{z}$

Simple Cycle

A cycle is simple if removing the last vertex results in a simple path

Directed Paths

A directed path is a path

N_{a}, \dots, N_{z}

Where every pair $(N_{i}, N_{i + 1})$ is joined by a directed edge

Connectedness

Two nodes are immediately connected if there is an edge joining them. Two nodes are connected if there is a path joining them. If this path is directed, the two nodes are strongly connected. If every node in a graph is connected to every other node, the graph is said to be connected.

Connectedness

Nodes

$N_{a}, N_{z}$ are (strongly) connected if there is a (directed) path $N_{a}, \dots, N_{z}$

Graphs

$G$ is (strongly) connected if every pair of nodes in $G$ are (strongly) connected

Component

A component is a maximal set of connected nodes (i.e. where every node in the set is connected to every other node). A strongly connected component is a maximal set of strongly connected nodes. In the above graph, $a$ and $b$ together form a strongly connected component. $c$ by itself forms a second component. The graph formed by replacing each component with a single node (but retaining connections between components) is called a condensation graph. This is most useful for representing the relationships between strongly connected components.

Components

(strongly connected) Component

Maximal set of (strongly) connected nodes

Condensation Graph

Graph formed by replacing each component with a single node

Multigraph

A limitation that both of these representations have is that they cannot represent parallel edges. Graphs that contain two edges both joining the same two nodes (and in directed graphs, where both go in the same direction) are called multigraphs.

Take a moment to consider what you know about set theory and think why we cannot represent such parallel edges in the above representations.

You may have noticed that above we are representing edges as a set. Sets do not contain repetition, thus were we to attempt to represent a set containing parallel edges, e.g. ${(v_{1}, v_{2}), (v_{1}, v_{2})}$ , this would be equivalent to the set with only a single edge ${(v_{1}, v_{2})}$ . The conventional solution to this is to change our graph representation as follows.

Let a directed multi-graph be defined as the tuple $(V, E, Φ)$ , where $V$ is a set of vertices ${v_{1}, v_{2}, \dots, v_{n}}$ , $E$ is a set of edges ${e_{1}, e_{2}, \dots, e_{n}}$ , and $Φ$ is a function of type $E \to V \times V$ (meaning that it maps edges to pairs of vertices, or it takes an edge and returns a pair of vertices). An extensional definition of the same graph as above is as follows:

({v_{1}, v_{2}, v_{3}}, {e_{1}, e_{2}, e_{3}}, {(e_{1}, (v_{1}, v_{2}), (e 2, (v_{2}, v_{3})), (e_{3}, (v_{3}, v_{1}))})

Multigraph

Let, $ϕ$ , be a function mapping edges to pairs of vertices

ϕ : E \to V \times V

A multigraph $G$ is the triple

G = (V, E, ϕ)

Graph

Task

Draw and describe graph $G = (V, E, ϕ)$ , where

V = {a, b, c, d} E = {e_{1}, e_{2}, e_{3}, e_{4}}

ϕ = {(e_{1}, (a, b)), (e_{2}, (b, a), (e_{3}, (a, c)), (e_{4}, (c, c)}

Adjacency Matrix

A graph can be represented as a matrix of edges between nodes

	A	B	C	D
A	1	0	1	0
B	1	0	0	1
C	1	0	0	0
D	0	1	1	0

Hint: Click to change

Graph Algorithms

Combinatorial Algorithms for Graphs

A few examples

Minimum spanning tree
Topological sorting
Breadth-First Search
Edmonds-Karp for MaxFlow

Minimum Spanning Tree

A spanning tree is a subset of a graph's edges that connect every vertex without including any cycles.

Imagine you were planning how to lay fibre-optic cables to connect houses in a neighbourhood. Utilities (pipes/cables, etc.) are laid under roads to avoid having to dig up people's houses and gardens. However, the road network is likely to contain many cycles. There is no need to lay cable under every road as this would be wasteful. Instead, you only need to lay cables under a subset of the roads to connect every house. A possible solution to this problem will be a spanning tree.

There are often many spanning trees for a given cyclic graph. A minimum spanning tree is a spanning tree which minimises the cost of edges in the tree.

In our cable-laying example, note that some roads are longer than others. When deciding how to lay the cable you would want to pick the shortest (lowest cost) roads that still allow you to connect every house. Here each edge in our graph can be assigned a weight. We can then use Prim's algorithm to find a minimum spanning tree for this graph (note there may be multiple minimum spanning trees).

Minimum Spanning Tree (Prim's Algorithm)

Assign all vertices an infinite cost; set start to $0$

While vertices not in

M

Add vertex $u$ with lowest cost to MST
Reduce cost of each adjacent vertex $v$ , to weight of ${u, v}$ edge, if smaller

Cost: 0

Topological Sorting

Graphs can be used to express dependencies between items. For example, you might breaking down a task you need to perform into subtasks, each of which have dependencies. If you were baking a cake you might need to:

Gather ingredients
Preheat oven
Mix ingredients
Put cake into oven

There are dependencies between these. 3 has to follow 1. 4 has to follow 3. 2 Can be performed any time before 4. There are often multiple things you can do at once. You could gather ingredients while a friend pre-heats the oven, for instance. However, if there is only one of you, you will need to decide on an order for these tasks to happen in. This is a topological sorting.

We can think of baking a cake as a graph. Each node is one of the tasks. A (directed) edge $(a, b)$ is a dependency: task $a$ must proceed task $b$ . When we decide on an order for these tasks, we need some ordering of the nodes of this graph such that none of the dependencies are violated. To do this we need an algorithm for finding valid topological orderings of a direct graph.

Other applications include managing software dependencies. If module A is required to build module B, and module B is required for to build module C, etc., what order do we need to include/install/compile modules? For example, if you have ever tried to compile Linux from scratch, you will have encountered this sort of chain of dependencies.

Topological sorting (Kahn's algorithm)

Create list of source nodes $S$
Create empty sorted list $L$

While

S

not empty

Move any node $n$ from $S$ to $L$
Remove all edges $(n, m)$
Add $m$ to $S$ if $m$ is now a source

Breadth First Search

We often want to find a solution, or a path to a solution, in a graph.

Imagine you're creating a computer game. The enemies in this game need to be able to walk towards you. If there are walls in the way, you want them to intelligently know to walk around the walls. If they are in a maze, we want them to be able to intelligently path-find through the maze. They can do this by using graph-search algorithms such as Breadth-First Search (for pathfinding in a game you would usually use the more efficient, though related, algorithm A*)

Say you are writing a program to play chess. On its turn it needs to decide which move to make. After it decides this move, it will need to decide another, and another, until eventually the end of the game is reached. At each decision, possible paths branch out to form a tree. To find the best decision to take at any given point, you would make use of a search algorithm to search for a path to a game state in which it wins the game. For competitive games, we often use the Minimax search algorithm.

Breadth-First Search Pseudocode

1procedure BFS(G: Graph, v: Vertex of G) is
2	create a queue Q
3	enqueue v onto Q
4	mark v
5	while Q is not empty do
6		w Q.dequeue() ←
7		if w is what we are looking for then
8			return w
9		for all edges e in G.adjacentEdges(w) do
10			x G.adjacentVertex(w, e) ←
11			if x is not marked then
12				mark x
13				enqueue x onto Q
14	return null

Breadth-First Search

Let $Q$ be a queue
Append $s_{0}$ to $Q$

While

Q

not empty

Pop top of $Q$ as $s_{n}$
Check if $s_{n}$ is goal
Add undiscovered children of $s_{n}$ to $Q$

$Q$
$s_{0}$

Flow Network

A Maximum Flow problem is a problem that can be represented as finding the maximum possible flow through a network defined by edges that each have a defined maximum capacity. For example, imagine that the edges were pipes in a household plumbing system and each pipe could carry a certain volume of water. We could calculate the maximum flow of water from one end of the network (where it is attached to the mains) to the other end (where there is an open tap).

More generally, a flow network or transportation network is a graph where each edge has a capacity. The network has one or more sources and one or more sinks. These are places where flow enters and leaves the network. The flow in to a given node must match the flow out of that node, with the exception of source nodes (where the flow in is 0, and the flow out may be positive), and sink nodes (where the flow out is 0 and the flow in may be positive).

We can convert a flow network with multiple sources and sinks to a flow network with a single source and a single sink by creating an addition source node and an additional sink node and linking existing source/sink nodes to these with an edge with unlimited capacity.

Flow network / Transportation network

Each edge in a flow network has a non-negative capacity

Definition

A flow network is $(G, c, s, t)$

Directed Graph $G = (V, E)$

Capacity function $c : V \times V \to N^{0}$

Flow cannot exceed capacity
Per node: Flow in = flow out

Source ( $s$ ) / sink ( $t$ ) nodes permit flow into/out of the network

Maximum Flow Problem

We may want to find the maximum flow entering and leaving each node / over each edge that satisfies the constraints given for a flow network above (the flow into and out of each node must be equal). To do this, we need to use a MaxFlow algorithm. The algorithm described here is the Edmonds-Karp algorithm for finding the Maximum Flow.

We can represent various kinds of problems as maximum flow problems. For example, timetabling, airline scheduling of flight crews, image segmenting, and more.

Maximum Flow Problem

A MaxFlow problem is finding a maximal flow through a flow network

Given flow network $(G, c, s, t)$ , find a maximal flow from $s$ to $t$

Question

What is MaxFlow of this graph?

Answer: 12

Maximum Flow Problem

3 houses with different numbers of rooms; 5 students looking for a house; each student only likes some houses; find an allocation of students to houses.

Approach

Represent as flow network

Find a max flow through network

Edmonds-Karp Algorithm

Pseudocode from Wikipedia:

1algorithm EdmondsKarp is
2    input:
3        graph   (graph[v] should be the list of edges coming out of vertex v in the
4                 original graph and their corresponding constructed reverse edges
5                 which are used for push-back flow.
6                 Each edge should have a capacity 'cap', flow, source 's' and sink 't' 
7                 as parameters, as well as a pointer to the reverse edge 'rev'.)
8        s       (Source vertex)
9        t       (Sink vertex)
10    output:
11        flow    (Value of maximum flow)
12    
13    flow := 0   (Initialize flow to zero)
14    repeat
15        (Run a breadth-first search (bfs) to find the shortest s-t path.
16         We use 'pred' to store the edge taken to get to each vertex,
17         so we can recover the path afterwards)
18        q := queue()
19        q.push(s)
20        pred := array(graph.length)
21        while not empty(q) and pred[t] = null
22            cur := q.pop()
23            for Edge e in graph[cur] do
24                if pred[e.t] = null and e.t ≠ s and e.cap > e.flow then
25                    pred[e.t] := e
26                    q.push(e.t)
27
28        if not (pred[t] = null) then
29            (We found an augmenting path.
30             See how much flow we can send) 
31            df := ∞
32            for (e := pred[t]; e ≠ null; e := pred[e.s]) do
33                df := min(df, e.cap - e.flow)
34            (And update edges by that amount)
35            for (e := pred[t]; e ≠ null; e := pred[e.s]) do
36                e.flow  := e.flow + df
37                e.rev.flow := e.rev.flow - df
38            flow := flow + df
39
40    until pred[t] = null  (i.e., until no augmenting path was found)
41    return flow

MaxFlow (Edmonds-Karp algorithm)

Keep track of current and total flow per edge
Keep track of total flow

While true

Use BFS to find the shortest path from source to sink with available capacity (or break)
Set $d$ = min unused capacity on path
Increase current flow per path edge by $d$
Increment total flow by $d$

Question

Using Edmonds-Karp, what is the maximum flow from $A$ to $G$ ?

Residual edges

When $n$ flow is sent along edge $(a, b)$ ; a residual edge $(b, a)$ is created with a capacity of $n$ .

Flow send along a directed edge can be returned (reduced) but only to a minimum of 0

Summary

Graphs

Graphs are vertices joined by edges

(Strongly) Connected, Directed (digraph), (In/Out)-degree,
(Simple) Path, Cycle,
Multigraph, Loops

Mathematical objects such as graphs can be represented in set theory

Graph Algorithms

Minimum Spanning Tree

Topological Sorting

Breadth-First Search

Edmonds-Karp for MaxFlow