Computational Complexity
In this lecture we will return to fundamental concepts in theoretical computer science. This lecture will cover algorithms and complexity.
Our goal today is to be able to calculate how long it will take for an algorithm to run, and to be able to compare which of two algorithms is more efficient.
Along the way we will learn about
- Models of computation
- Asymptotic Complexity
- Summation
Learning Outcomes
- Create time functions for simple example algorithms
- Recognise and order common complexity classes in big-O notation
- Identify the complexity class for simple example algorithms
- Recognise and correctly interpret sigma notation
- Calculate the summation over a finite range by hand
- Rearrange summations involving addition and multiplication
- Recognise common formulas to substitute in a summation
- Simplify nested summations
Introduction
In an earlier lecture we saw that there are limits on what can be computed. We looked in particular at Finite State Machines and we saw how they were only able to solve a limited set of problems. We also saw that the Turing machine, the universal model of computation, also has limits on what it can compute.
However, while it is interesting to know what can be computed at all, it is more useful to know what can be computed efficiently. Are all problems equally different to solve, or are some harder to solve than others? This is the case, as we will see.
An algorithm is a sequence of steps to perform, often for solving a problem. For instance one might describe an algorithm for:
- Sorting a list
- Searching a graph
- Baking a cake
- Checking a proof
Multiple algorithms can be created to solve the same problem. For instance, multiple algorithms exist for sorting a list. Naturally we want to choose the best algorithm for a particular task, but if multiple algorithms solve the same problem, on what basis to we make our choice?
Overview
We will start by defining the term "algorithm". We will first see an informal definition and then formalise this in terms of the models of computation we learned about it an earlier lecture.
This done, we will turn to complexity. We will find that to calculate the complexity of an algorithm we will need to use summations, so we will turn to these next.
Having learned how to calculate and manipulate summations, we will apply this to calculating the complexity of algorithms.
Algorithms
Informally, an algorithm is a process or a sequence of steps. For example:
- Preheat oven
- Make cake
- Put cake in oven
- Wait 30 minutes
- Take cake out of oven
Algorithms are of central importance in computer science. In particular, we are interested in questions such as
- What problems can algorithms (not) solve?
- Which problems can be solved efficiently?
- Which kinds of computational models can solve which kinds of problem?
Pseudocode
In order to create or discuss an algorithm, we need some way of writing it out. Helpfully, there are hundreds of languages that have been invented to describe algorithms: programming languages. A programming language, like Java or Python gives us a set of rules to write and interpret an algorithm. These are generally easy to program in.
Because we do not necessarily want all of the complexity of a given programming language, we generally instead make use of a simplified language, or pseudocode. There are various standardised pseudocodes that exist. These often use syntax similar to one or more existing programming languages.
For example, I might write a simple algorithm in pseudocode. Even though I had to adopt a set of conventions to write out my code, most people with knowledge of basic programming concepts will be able to understand it:
1
2
3
4
5
6
Function A(n: integer)
sum := 0
for i=1 to n:
sum = sum + i
endfor
return sum
Programming languages and pseudocode, while they may be nice to use and easy to read, are hard to formally define. Thus while pseudocode is the most common way of discussing algorithms, there are also mathematically formal ways of defining an algorithm.
Formally defining an algorithm
We saw, in an early lecture, some examples of models of computation. Just as these can be used to formalise computation, they also let us formalise algorithms. For example, we can give a precise mathematical definition of an algorithm if we use a model of computation.
For example, we may decide on a Turing machine as our model of computation. An algorithm is therefore a Turing Machine. This completely specifies how the algorithm works, unlike with pseudocode where ambiguities are possible.
By selecting the Turing Machine as our model of computation, we can define what algorithms it is possible to compute. It also means it is possible to precisely measure the amount of work required to compute the algorithm for a given input. For example, we might count the number of steps the Turing machine makes during its computation.
While such formalisations benefit from mathematical precision, they are horrendous to program in. We know, assuming the Church-Turing thesis, that any computation that can be performed on a reasonable model of computation can be performed using a Turing Machine. In fact, it is possible to prove properties about how efficiently a Turing machine can simulate other kinds of machine. From this, general properties about the efficiency of algorithms can be proved rigorously.
Efficiency
Whenever we perform computation in the real world, we are faced with physical constraints. Computation requires resources (time, money) that we only have a limited amount of.
While a computational model like a Turing machine may theoretically be able to execute for infinite time and make use of an infinitely long tape, in the real world this is not the case. We want to make efficient use of resources. These resources could be anything consumed by the computation (e.g. electricity), but usually we are concerned with two main resources:
- Time - e.g. the number of steps a Turing machine takes to compute the algorithm on a given input
- Space (memory) - e.g. the length of tape required for this computation
Testing
We can test the efficiency of an algorithm by running it. This requires implementing it in a particular programming language and running it on a particular computer. While practically this is very helpful - because we are generally concerned with writing programs that run efficiently on given hardware - it is hard to draw general conclusions from this data.
For example, say you run an algorithm on your computer and it runs quickly. Can you be confident it will run quickly on another computer? Can you be confident it will run quickly on every computer that might ever be built?
You can think of testing as the scientific approach. We collect some measurements in particular situations and try to form general conclusions. However, as in science, we face the problem of induction - we can only ever be more or less confident our conclusions are correct, we can never be certain.Mathematics strives for absolute certainty derived from mathematical proof.
Assumptions and Abstractions
Generally when using an algorithm to solve a problem we have limited time and memory (space). We want to use an algorithm that requires the least time and/or space. Practically the most important constraint tends to be time. Thus we need to be able to say how much time a particular algorithm will take. Unfortunately, it is impossible to give a precise answer as to this question as there are too many variables.
Imagine we are sorting a list of length $n$. We do not know:
- The speed of all of the operations required (variable assignment, addition, variable tests, etc.) on the hardware used
- How many elements are in the list (the size of $n$).
- What other processes might be slowing down the computer at the same time.
Thus for a principled way to compare algorithms we need to be able to abstract away from all of this uncertainty. We need to make some simplifying assumptions by which we can construct a model of the time taken by an algorithm. First, we assume that all primitive operations take a constant amount of time. We might call this amount of time $t$. Thus if an algorithm (such as given below) performed two variable assignments, an addition, and another variable assignment, we might say the time taken is $4t$.
1
2
3
4
5
Procedure A()
a := 0
b := 0
c := a + b
return r
Many algorithms take as an argument some input of variable size. For example, if we were to sort a list of size $n$, the size of the input is $n$. Some algorithms might be more efficient than others for given values of $n$. What principle can we use to compare between such algorithms? Generally differences in time for small values of $n$ are not of much practical importance. Thus we consider what happens when $n$ is very large using Asymptotic Analysis.
Computational Complexity
Asymptotic Computational Complexity applies asymptotic analysis to algorithms by considering the rate at which the time (or space) they require increases as the size of their input increases. Sometimes the amount of time required does not change much as the size of the input increases. Sometimes the amount of time increases rapidly by increasing the size of the input only a small amount. For example, consider the following two algorithms defined on a linked list. FirstElementOf
returns the first element of the list. LastElementOf
returns the last element.
1
2
Function FirstElementOf(x: LinkedList)
return x.next()
1
2
3
4
5
Function LastElementOf(x: LinkedList)
while true:
e := x.next()
if not e.hasNext():
return e
FirstElementOf always takes the same amount of time to complete, regardless of the size of $x$. In contrast, LastElementOf will take longer the more elements there are in $x$. In fact, the time it takes to return will grow linearly (at a constant rate per item). Consider the following algorithm to check if a (1-indexed) list does not contain any repeated elements. The time taken for this algorithm grows with the square of the size of the list. This rate of growth is called quadratic.
1
2
3
4
5
6
Function NoDuplicates(x: Array)
for i := 1 to length(x):
for j := 1 to length(x):
if ((x[i] == x[j]) and (i != j))
return false
return true
Time Functions
To work out the time complexity of an algorithm, we first need its time function. This is a function that tells us how much time the algorithm takes for an input of a given size. For LastElementOf, depending on how we count operations, we might give the following time function:
That is, for a list of length $n$, we perform 4 operations for each item time. Thus we do the summation $4 + 4 + 4 + \dots$ until we have added 4 $n$ times, which we can also write \( \sum_{i=1}^{n} 4 \). As you will soon see, we can simplify this to $4n$, then we add the 1.
In $4n + 1$, the fastest growing term is $4n$ (the term $1$ is constant so does not grow with $n$). It doesn't really matter that it is $4n$ rather than $5n$: if we can increase $n$ indefinitely the difference is insignificant. Thus the rate of growth can be approximated as $n$, for large $n$s. This means our algorithm is in the linear complexity class. We will notate this $LastElementOf \in O(n)$.
Working out time functions can get more complicated as the structure of the algorithms get more complicated. Look back at the algorithm NoDuplicates. Unlike LastElementOf, this algorithm has two nested for loops. If we are adding time for each iteration of a loop and we have nested loops, we will be summing over summations of time. The time function for the algorithm NoDuplicates might be:
Here we take 4 time for each pair of values in a list. There are two nested for loops: we add 4 every time we go through the inner loop; and we add the sum of these 4s every time we go through the outer loop. As you can see, when working out a time function for an algorithm we can easily get complex summations to solve. Thus we need to know how to solve summations.
Big-O notation
Big-O notation lets us refer to sets of functions or algorithms whose time (or space) complexities grow at given rates. These sets are called complexity classes Generally we want to choose an algorithm that has the smallest time complexity. For example, if two algorithms do the same job, but one does it in $O(n \log n)$ rather than $O(n^2)$ time, we will prefer the former. That said, this model is imperfect; real world circumstances make a big difference. The common complexity classes are, in order from slowest to fastest-growing:
Big-O Notation | Complexity Class Name |
---|---|
$O(1)$ | constant |
$O(\log n)$ | logarithmic |
$O(n)$ | linear |
$O(n \log n)$ | log linear |
$O(n^2)$ | quadratic |
$O(n^3)$ | cubic |
$O(e^n)$ | exponential |
$O(n!)$ | factorial |
For the algorithms discussed above: $FirstElementOf \in O(1)$; $LastElementOf \in O(n)$; and $NoDuplicates \in O(n^2)$.
Summation
Imagine trying to add up every number between 1 and 100. This might sound like it would take a long time. Even writing out the sum would take a lot of time and a lot of space, so I'll just write the first part of it:
We can express the same thing more compactly as a summation using a type of notation called sigma notation, because it uses the capital Greek letter $\Sigma$. We interpret the following as meaning "Let $i$ be a number. For every value of $i$ between 1 and 100 (inclusive) add $i$." It means the same as the formula above (or what the formula above would mean if it was written out fully)
The number below the summation (or at the bottom, if written inline such as \( \sum_{i=1}^{100} i \)) defines the lower bound for a variable, which is also specified. This variable need not be $i$, it could be any variable such as $j$, $k$, etc. The number at the top defines the upper bound -- the largest value that this variable takes. We could think of this as defining a bounded sequence of values of $i$: \( \{1, 2, 3, \dots, 99, 100 \} \). The increment between successive values of our variable is always 1.
Now that we have defined a sequence of values for $i$, we can look at the second part of the expression, to the right of the sigma. This gives us a formula within which we must substitute a value of $i$. To make this clearer, let's use the example below. Here we are summing the function $f(i)$ for values of $x$ between 1 and 100. That is, we are performing the sum $f(1) + f(2) + \dots + f(99) + f(100)$.
We know that the values of $i$ are given by the sequence \( \{1, 2, \dots 99, 100\} \). By substituting values of $i$ into the formula $f(i)$, we would get a sequence of values \( \{f(1), f(2), f(3), \dots, f(99), f(100)\} \). By adding all these values together we would get the answer to our summation.
We do not necessarily know what the lower and upper bounds are. Most commonly we sum from 1 to $n$, but our lower limit could be anything. Knowing what a summation is doing will help you deal with situations with uncommon limits.
Interpreting a Summation
How do we interpret the following formula?
Let there be a variable $i$
- At first, $i =1$
- For each value of i
- Substitute current value of i in the formula
- Add formula to running total
- Check if $i$ is less than 4
- If so, increment i and repeat
- else, stop
This formula tells us to enumerate the set \( \{1, 2, 3, 4\} \), adding up all the numbers it contains.
Doing so, we find the answer is 10, because $1 + 2 + 3 + 4 = 10$
Summation questions
Solve the following summations
Iterating over Collections
Can iterate over sequences and sets
If no variable / lower limit / upper limit is given, iterates over entire collection
Factor Constants out of a Summation
First observe that when the value being summed is multiplied by a constant, e.g. $1a + 2a + \dots + na$ this is equivalent to the sum as a whole being multiplied by the constant.
In other words, we can factor the constant out of each of the values in the summation $a( 1+ 2 + \dots + n)$
Break up Summation across a Sum
Associativity is the property $(a + b) + c = a + (b + c)$
Because addition is associative, it doesn't matter in which order we perform summations. Thus if we had a summation such as $(1 + a) + (2 + b) + (3 + c)$, we could rewrite this as $(1 + 2 + 3) + (a + b + c)$.
In other words, we can break up a summation across sum. Note the summations will be over the same variable and range as the original summation.
Products
A product is the multiplication of two or more values. For example we might wish to work out the product of the numbers between 1 and 10.
As with summation, there is a special notation to write this. It is called product notation or pi notation as it uses the capital Greek letter $\Pi$. The format of product notation is very similar to that of sigma notation. The above product could be written in product notation as follows:
Observe that the product of a constant is the same as multiplying that constant by itself some number of times, i.e. that constant raised to a power: $c \times c \times ... \times c = c^n$. This gives us a formula with which to simplify products:
Observe what happens when we take the log of a product. We get a summation!
Thus for any product, if we are able to take the log (i.e. we are working with an equation where we can take the log of both sides), we can get it into a form with a summation of logs instead of a product.
Summation Formulas
There are formulas for finding the sums of various common series. You don't need to remember them, but do remember they exist so you can look them up if necessary
There are rules that help us to transform some expressions in Sigma notation into a polynomial expression (i.e. of the form \( a_1x^k + a_2x^k-1 + \dots + a_{n-1}^2 a_n \) ). We can often translate complex sets of additions into summations very easily (e.g. when analysing algorithms), but there is not much we can do with a summation as it is. Thus we transform it into a form that is easier to work with.
When doing this it saves us a lot of time if we have to hand the formulas for some common summations. The basic process is to manipulate the summation to isolate these forms and then substitute them for the equivalent formula.
Sum of constants
Lets say we have a constant sequence of numbers, such as:
What is the sum of the first n values of this sequence?
If we have n values, then $n \times 2 = 2n$
We can express this as a summation
We cannot solve this to a number, but we can simplify it to a form that is easier to work with
Sum of the first n natural numbers
We often need to sum the first n natural numbers: $1, 2, 3, 4, \dots, n$
Problem: arbitrarily long
Easy to write with sigma notation!
But how do we calculate the sum for a given n?
Deriving the sum of natural numbers
As a schoolchild, mathematician Karl Fredrick Gauss famously discovered a method for finding the sum of the natural numbers. His teacher gave the class the task of adding up all the numbers between 1 and 100, presumably expecting this to keep them busy for some time. Gauss, however, soon found the answer. This was his method.
Imagine the numbers are paired, such that the smallest is paired with the the largest, the second smallest is paired with the second largest, etc. By doing this we can see that these numbers will always add up to the same value:
Now we can just think about adding up the sum of these pairs, or in other words, adding up some multiple of the constant value 101. How many 101s should we add together? As there were 100 numbers to start with, and we have put them into pairs, there must be $100/2 = 50$ pairs. Thus we add 101 together 50 times.
We know from summing a constant series that
We can easily see that were we adding the numbers between 1 and $n$, the sum would be $n+1$, for \( n > 1 \). We can express this as a formula (which is identical to the formula for the sum of natural numbers):
Anther way of thinking about this is to find the average.
Observing that each pair adds up to 101, we can reason that the average (mean) value is 101/2. If we multiply the average value by the number of values, we get the sum
This is a natural way to understand the formula for the sum of $n$ natural numbers:
Sum of the first n square numbers
The square numbers are
We often need to find the sum of n squares
The formula for the sum of the first n squares is:
Deriving the sum of squares
To derive a formula for the sum of squares, let's begin by writing out the first several sums of $n$ squares as a sequence $S$. We will want to find a formula that produces the sequence $S$
$n$ | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|---|
$n^2$ | 0 | 1 | 4 | 9 | 16 | 25 | 36 |
$S_n$ | 0 | 1 | 5 | 14 | 30 | 55 | 91 |
Let us assume $S$ can be defined by a polynomial function. A polynomial is a simple kind of formula involving one variable (e.g. $x$) to any exponents, and constants. For example $ax^2 + bx + c$ is a polynomial, as is $ax^3 + bx^2 + cx + d$. Formally, a polynomial is a function that can be defined \( \sum_{k=0}^n a_kx^k \) for some sequence of constants $a$.
Assuming $S$ is a polynomial, we can identify what form it would take depending on how it changes. $\Delta S$ is a sequence giving the changes in S. $\Delta_2 S$ is equivalent to $\Delta\Delta S$, i.e. the sequence giving changes in the changes in S.
- If $S$ were constant, the formula would be a constant: $a$
- If $Δ_1 S$ were constant, the formula would be of order 1 (linear): $ax + b$
- If $Δ_2 S$ were constant, the formula would be of order 2 (quadratic): $ax^2 + bx + c$
- If $Δ_3 S$ were constant the formula would be of order 3 (cubic): $ax^3 + bx^2 + cx + d$
$n$ | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|---|
$n^2$ | 0 | 1 | 4 | 9 | 16 | 25 | 36 |
$S_n$ | 0 | 1 | 5 | 14 | 30 | 55 | 91 |
$Δ_1 S_n$ | 1 | 4 | 9 | 16 | 25 | 36 | |
$Δ_2 S_n$ | 3 | 5 | 7 | 8 | 11 | ||
$Δ_3 S_n$ | 2 | 2 | 2 | 2 |
As $Δ_3 S$ is constant (=2), we know the formula for sum of squares is of the form \( an^3+bn^2+cn+d \). To find values for $a$, $b$, $c$ and $d$, we can find a set of simultaneous equations for pairs of $n$ and $S_n$. From the first line in the table we can see that $d = 0$ so it has been excluded from subsequent lines for ease of reading
$n$ | $S_n$ | Substitution in formula | Equation |
---|---|---|---|
0 | 0 | $0^3a + 0^2b + 0c + d$ | $d = 0$ |
1 | 1 | $1^3a + 1^2b + 1c$ | $a + b + c = 1$ |
2 | 5 | $2^3a + 2^2b + 2c$ | $8a + 4b + 2c = 5$ |
3 | 14 | $3^3a + 3^2b + 3c$ | $27a + 9b + 3c = 14$ |
We solve the equations in the table simultaneously for $a, b, c$. In the following I use numbers in brackets (e.g. [1]) at the end of a line to label formulas and then use these as a shorthand for that formula.
- $a + b + c = 1 \qquad [1]$
- $8a + 4b + 2c = 5 \qquad [2]$
- $27a + 9b + 3c = 14 \qquad [3]$
- $[3] - 3 \times [1] = 24a + 6b = 11 \qquad [4]$
- $[2] - 2 \times [1] = 6a + 2b = 3 \qquad [5]$
- $4 \times [5] - [4] = 2b = 1 \qquad [6]$
- $[4] - 3 \times [5] = 6a = 2 \qquad [7]$
- $[7]/6 = a = 1/3$
- $[6]/2 = b = 1/2$
- $1/3 + 1/2 + c = 1$
- $c = 1/6 $
We can substitute the values found for $a = 1/3$, $b = 1/2$, $c=1/6$, and $d = 0$ into our formula. This gives us the formula for the sequence $S_n$, which I have subsequently rearranged into its familiar form.
Calculating Time Functions
In order to demonstrate the full process of finding the time function and complexity class of a simple algorithm, this section gives three examples.
Imagine we have an algorithm that contains two for loops as shown below:
1
2
3
4
5
6
Procedure Example1(x)
for i := 1 to length(x):
for j := 1 to length(x):
...
endfor
endfor
We have two nested for loops, each iterating over length(x), which we will call $n$. From examining the ranges these loops iterate over we might determine that the time function of this algorithm is, more or less:
That is, we take 4 operations of some constant time (omitted here) within the inner loop, which we have to sum for each iteration through the nested loops with their respective ranges. Constants such as $4$ here matter for the summation but won't make a substantive difference to our time complexity as we ignore all constants in the final step, so we don't need to worry too much about where the number 4 comes from. From here we can then simplify the summations using the formulas above.
The fastest growing term in this expression is $4n^2$, removing the constant gives us $n^2$. Thus the algorithm has time complexity $O(n^2)$.
Here I have made a small change to the algorithm to make the summation more interesting. The inner loop iterates only up to $i$, which is the iterator for the outer loop. This is going to reduce the amount of time the algorithm takes in real terms, but from the perspective of asymptotic complexity, will it make a difference?
1
2
3
4
5
6
Procedure Example2(x)
for i := 1 to length(x):
for j := 1 to i:
...
endfor
endfor
We have two nested for loops. This time only the outer loop ranges up to $n$, the inner loop ranges up to $i$. From examining the ranges of these loops we might determine that the time function of this algorithm is, more or less:
Which we can simplify:
As the fastest growing term is $2n^2$, the complexity of this algorithm is also $O(n^2)$.
Changing the the range that one of the for loops iterated over did not have an effect to the overall time complexity of the algorithm. In contrast, lets see what happens when we add another nested for loop.
1
2
3
4
5
6
7
Procedure Example3(x)
for i := 1 to length(x):
for j := 1 to i:
for k := 1 to j:
...
endfor
endfor
From this we might determine that the time function of this algorithm is:
Which we can simplify:
Here the fastest growing term is \( \frac{n^3}{12} \). \( \frac{1}{12} \) is a constant, so we can ignore it, leaving a time complexity of this algorithm of $O(n^3)$.
In these examples we have seen that Example1 and Example2 have two nested for loops their time functions grow at $O(n^2)$, but Example3 has three nested for loops and its time function grows at $O(n^3)$. Counting the number of nested for loops is usually a good rule of thumb for quickly determining the time complexity of an algorithm.