Sorting (from Algorithms to Live By)

This post is split out from my main summary of Algorithms to Live By by Brian Christian and Tom Griffiths. Check out that summary to learn more about the book.

There are massive diseconomies of scale with sorting. As the number of items to sort increases, the costs of sorting can increase exponentially. Sorting is a huge part of what computers do, and they frequently deal with millions or billions of values. Efficient algorithms are therefore critical.

How to measure algorithms (Big-O notation)

Sorting algorithms are measured by their worst-case scenario — how long does something take to sort at most?

The time complexity of algorithms is denoted by the “Big-O” notation:

O(1) is constant time. It doesn’t matter how many items you have, the algorithm takes a constant amount of time.
O(n) is linear time. A list of length 2n takes twice as much time as a list of n.
O(n^c) is polynomial time. The time grows with the polynomial of the input size. A list of length n takes n^c time, where c is a constant.
O(c^n) is exponential time. Again, c is a constant. If c is 2, each additional item doubles the work. If c is 3, each additional item triples the work.
O(n!) is factorial time. This is the worst of the Big-O notations. The time to complete the algorithm grows at a rate of n factorial. Algorithms that work through all possible permutations or combinations of a set of items tend to run in factorial time. For example, if the travelling salesman problem has just 10 towns, that’s 10! or over 3.5 million permutations.

Specific algorithms

Bubble Sort (O(n^2))

Imagine you want to sort your bookcase alphabetically. Bubble Sort means you scan your shelf for any pair of books that are out of order and then flip them. Then you loop back at the end of each shelf and repeat the process.

Under this algorithm, each time you scan the shelf, each book can only move one position. It’s simple, but very inefficient. Sorting 5 bookshelves with this method will take 25 times (5^2) as long as sorting a single bookshelf. However, one advantage of it is that it is very robust against noise, since one random error would only move one item at once.

Insertion Sort (O(n^2))

With Insertion Sort, you take all your books off the shelves and put them back one by one. As you place each book on the shelf, you scan through the existing books on there until you find the right place for it. While this may be even more intuitive than Bubble Sort, it’s not actually much faster.

Comparison Counting Sort (O(n^2))

This algorithm compares each item to every other item and works out how many items it is bigger than. That number is then used as the item’s rank. It is like a Round Robin sports tournament (see further below), and is the most robust algorithm to noise.

Merge Sort (O(n log n))

With Merge Sort, you break up a big stack into two smaller stacks and sort the smaller stacks. Then you merge the two (sorted) smaller stacks back into a big stack.

This divide-and-conquer approach turns out to be much faster, and the process of merging two sorted stacks is simple and takes just linear time. Merge Sort ends up taking “linearithmic time”, somewhere in between linear and quadratic time. It’s an enormous improvement on quadratic sorting, though — it can be the difference between making 29 passes through your dataset and 300 million. However, its efficiency makes it more vulnerable to statistical noise, as a fluke result can end up moving an item a very long way.

Bucket Sort (O(n))

In Bucket Sort, items are grouped into several sorted categories or “buckets”, but items in each bucket are not sorted. This is fine if you don’t need a fully ordered set, and you don’t need to compare items to each other. You can also combine Bucket Sort with other algorithms — e.g. Bucket Sort until your buckets are small enough to use Insertion Sort or Merge Sort. Ideally your buckets should have around the same number of items in each.

Bucket Sort is faster than linearithmic time. If you want to group n items into m buckets, it will take O(nm) time. But as long as the number of buckets is relatively small compared to the number of items, you can round that to just O(n), linear time.

Benchmarks make sorting easier

When you can assign a measure to something, you can order a set of items without direct head-to-head matchups. It’s the difference between a sports tournament and a marathon.

It is often noted that a benchmark like national GDP … is a crude, imperfect measurement. But the existence of any benchmark at all transforms the question of national status from one demanding at least a linearithmic number of tussles and resolutions into something with a single reference point that ranks all.
Christian and Griffiths, Algorithms to Live By

Sorting in unexpected places

Sporting tournaments are sorting tournaments

Sporting tournaments are actually sorting algorithms, as they are designed to produce an ordering or ranking.

Single Elimination or Bracket Tournaments, where are like a Merge Sort. They are fast, but not robust. There’s only around a 50% chance of the second-best player placing second. [I think this is before accounting for statistical noise so the odds may be even worse if you take that into account.]

Round-Robin and Ladder Tournaments are like Comparison Counting Sort and Bubble Sort, respectively — much more robust and fair, but much slower.

Animal hierarchies

Hens literally peck at each other to produce a hierarchy (hence the term “pecking order”). Studies show that, consistent with how most sorting algorithms work in logarithmic or quadratic time, aggressive acts per hen increase as group size increase.

However, a clear hierarchy reduces violence as hens won’t pick fights with those clearly above them. Debeaking chickens can therefore increase violence as it prevents them from establishing a clear order that lets them know where they stand.