The Big Ol' Primer On Computational Complexity (and Big-O Notation)
2018-01-17

When we use algorithms - sets of repeatable steps to perform calculations or processes - they take memory and time to run. We don’t have infinite amounts of either, so it’s useful to understand your code’s ‘computational complexity’ - the resources it will consume when it runs. We show this complexity with “Big-O Notation”, as you’ll see shortly.

## Disclaimer

This is only a high-level overview, and skips many nuances, including heavily simplifying sorting and search algorithms.

## Quick Warmup

Here’s a very quick, stupid example to get us warmed up:

``````def find(needle, haystack):
"""Find the first instance of needle in our haystack and return the index, or None if no result found"""
index = None
for i in range(len(haystack)):
if needle == haystack[i-1]:
index = i-1
break
return index

words = ["quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog"]

# Returns: 0
print(find("quick", words))

# Returns: 6
print(find("lazy", words))

# Returns: None
print(find("monkey", words))
``````

Our algorithm `findNextBigOne()` doesn’t do anything useful - there are better options for searching arrays in most languages - but it is an `O(n)`, or linear time algorithm, where `n` refers to the number of elements of data it works on (in this case, the eight words in `words`).

For example, let’s say `find("quick", words)` takes, on average, `1 second` to process a list with just one word (e.g. `words = ["quick"]`). That means we should expect it to take approximately `5 seconds` to process a list of five words (e.g. `words = ["quick", "brown", "fox", "jumped", "over"]`); or `10 seconds`, for a list of 10 words (e.g. `words = ["quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog", "while", "the"]`)

So the `time complexity`, or time our algorithm takes to run, grows linearly with the number of items it’s given. If there’s twice as many elements, it’ll take twice as long. If there’s half as many, then half as long.

If we refer to the number of items in our data as `n`, then our algorithm’s complexity is `O(n)`.

## Best/Worst/Average Performance

We’ve determined the algorithm runs in `O(n)` time, but that’s actually how long it’ll take on average. In fact, when we talk about computational complexity, we’re often referring to three cases:

### Best-Case Performance

Let’s go back to this code:

``````def find(needle, haystack):
"""Find the first instance of needle in our haystack and return the index, or None if no result found"""
index = None
for i in range(len(haystack)):
if needle == haystack[i-1]:
index = i-1
break
return index
``````

Imagine we are, for whatever reason, always searching for the second word in our list. E.g.

``````# Returns 1
print(find('world', ["hello", "world"]))

# Returns 1
print(find('bob', ["alice", "bob", "charlie"]))

# Returns 1
print(find('brown', ["quick", "brown", "fox", "jumped"]))

``````

Who knows why we’re doing this, maybe our program got stuck in a loop. Nevertheless, every single time we run this code, we only ever check the first two elements in our list of words. It doesn’t matter if that list has 2 items in it, or 2000 - we’re still only checking the first and second item, which means it’ll take approximately the same amount of time to run, every time we run the code.

This is known as constant time, as the time to run our code doesn’t change, and is the best case performance for our `find` algorithm, because it’ll run consistently in the same amount of time, even if our array gets really, really big.

When denoting a constant time performance, we say it is `O(1)`

### Worst-Case Performance

Let’s refer back to the `find()` code, and imagine we are, for whatever reason, always searching for words that don’t exist in our array. E.g.

``````words = ["quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog"]

# Returns: None
print(find("monkey", words))

# Returns: None
print(find("hello", words))

# Returns: None
print(find("world", words))
``````

…and so on. Every time we run that algorithm, it has to check every word in our list before it’s sure that the word does not exist. This isn’t good, because although it’s alright if our list is small (say, 20 items or so), it’ll take longer and longer to run this algorithm, the larger and larger our list gets.

This is known as the worst case performance for our algorithm - the absolute longest it’ll take if the whole world is against us. Because we have to check each element one-by-one, it will take us linear time - i.e. longer and longer for larger and large lists.

So our worst-case performance for our linear search is `O(n)`

### Wait A Second… (Average-Case Performance)

But hangon, wasn’t our average-case performance also `O(n)`?

Ah, you caught me. In fact, let’s think about an average use case for our `find()` code:

``````words = ["quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog"]

# Returns: 1
print(find("brown", words))

# Returns: 5
print(find("the", words))

# Returns: 0
print(find("quick", words))

# Returns: 4
print(find("over", words))
``````

In fact, in our normal operation, our `find()` algorithm will be used to search for random words in our list. Half the time they’ll be at the start of our list, half the time they’ll be at the end. This means, on average, we’ll only have to check half the number of items in the list.

If our list is 10 elements long, on average we’ll only have to check 5 of them. If our list is 3000 elements long, we’ll only have to check 1500.

So, you might think this means we should denote our average complexity as `O(½n)` (or `O(n/2)` - after all, it’ll take half as long as our worst-case performance.

In fact, when we’re denoting algorithm complexity, we’re only trying to visualize how the time or memory usage of our algorithm will grow as the number of items it has to process gets bigger and bigger. When we think about our `find()` average performance, although it only has to process half as many elements on average (5 from a list of 10, 1500 from a list of 3000), that average time still grows linearly for the number of items, `n`, in our list. It may grow half as fast, but it still grows in a `linear` fashion.

In short, we’re unconcerned with exactly how quickly or slowly its performance grows (in the best, worst, or average case scenario), just the nature in which it grows - in this case, linearly.

Thus, we still denote the average-case performance as `O(n)`

### `find()` Complexity Summary

In short, when we’re denoting an algorithm’s complexity, it’s useful to know its best-case, worst-case, and average performance.

In the case of our linear search `find()` algorithm, that is:

``````    Best Case:    O(1)
Average Case: O(n)
Worst Case:   O(n)
``````

## Other Time Complexity Examples

Well, we’ve seen an example of linear growth, but how about other complexities? Full analysis of algorithms dives deep into computer science and mathematics, but we can certainly get a general feel for some other scenarios.

### Insertion Sort

Let’s say we want to take our list of words, and put them in order, for some reason. Sounds like we need a sorting algorithm! Le’s go for a good old-fashioned insertion sort

``````def insertion_sort(heap):
"""Take a heap of unsorted words, and sort them"""
for i in range(1, len(heap)):
j = i
while j > 0 and heap[j-1] > heap[j]:
k = heap[j]
heap[j] = heap[j-1]
heap [j-1] = k
j = j - 1
return heap

words = ["quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog"]

# Returns: ['brown', 'dog', 'fox', 'jumped', 'lazy', 'over', 'quick', 'the']
print(insertion_sort(words))
``````

As with our linear search, our `n` refers to the number of elements we’re feeding into `insertion_sort` - in this case, eight words.

When we do an `insertion_sort()`, we access every element in our list at least once, as we have to put it in the correct place. This, by itself, is an `O(n)` operation - each element is accessed once. However, every time we access an element, we then compare and swap it with each already sorted element until we find where our value is meant to be “inserted”.

This means, on averge, we access every element (an `O(n)` operation), and for each of those elements, we access most/all other elements (a second `O(n)` operation); that is, we perform `O(n)*O(n)` comparison operations, which brings us to an average time complexity for comparing items of `O(n^2)`;

That is to say, if it takes (on average), `1 second` to process a list of 1 item, it’ll take `16 seconds` to process a list of 4 items, or `400 seconds` for 20 items, etc.

However, in the best case scenario, if all the elements in our list are already in order, `insertion_sort()` actually only has to check each element once, without having to move them around; this means it has a best case complexity of `O(n)`, linear time.

### Merge Sort

As you can see, insertion sort can get very slowly, very quickly (except for the magical case of being already, or nearly already sorted); so let’s take a look at one of my favourite algorithms, merge sort:

``````def merge_sort(heap):
"""Take a heap of unsorted words, and sort them"""
sortedHeap = []
if len(heap) > 1:
left = merge_sort(heap[:len(heap)/2])
right = merge_sort(heap[len(heap)/2:])
while len(left) > 0 and len(right) > 0:
if left <= right:
sortedHeap.append(left)
del left
else:
sortedHeap.append(right)
del right
sortedHeap.extend(left)
sortedHeap.extend(right)
else:
sortedHeap = heap
return sortedHeap

words = ["quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog"]

# Returns: ['brown', 'dog', 'fox', 'jumped', 'lazy', 'over', 'quick', 'the']
print(merge_sort(words))
``````

Once again, as with our `find()` or `insertion_sort()`, our `n` refers to the number of elements we’re feeding into `merge_sort`.

Like `insertion_sort()`, as with all sort functions, you must access each element at least once (otherwise how could we check they’re in the right place?); so we already know that our algorithm is going to be `O(n...)` of some kind.

However, unlike `insertion_sort()`, our `merge_sort()` doesn’t need to compare each element against every other element; in fact, it really only compares against one or two other elements. So already we can intuitively tell that it’s likely to be faster than `insertion_sort()`, because it seems to be doing less work.

In fact, it turns out that this intuition is correct; the ‘why’ is unfortunately beyond this primer, and dives into the world of algorithmic analysis, but the short version is that `merge_sort()` does approximately `log(n)` comparisons for each element.

This gives an average complexity of `O(logn)` comparisons on each element (an `O(n)` operation), for an average complexity of `O(nlogn)`

## Applying This To Your Everyday Code

While it’s unlikely you’ll be doing full-on computational analysis in your day-to-day coding, and many modern languages and libraries abstract away the common pitfalls of old, it’s still useful to have some idea of how your code may be faster or slower when doing things in certain ways, especially if you’re working on arrays of data.

For example, you may consider whether you’re performing an operation on every element of an array, and whether that operation in turn is performing other functions that’ll grow slower with larger arrays. It can be useful to consider, with some quick mental comparisons, how one copy of code may be doing more work than an alternative version.

## As Always, A Graph

Here’s the graph from the Time Complexity article on Wikipedia, which demonstrates how algorithms take longer, the more data you feed them If you think back to our linear time (`O(n)`) algorithm from earlier, you can see that as the number of elements in the data grow, so does the time, in a straight (linear) line.

For something like our `insertion_sort()`, which has a complexity of (`O(n^2)`), you can see it grows veeery quickly, as the number of elements increases; faster than the comparison `merge_sort()`, with a complexity of `O(nlogn)`