Bayes's Rule (from Algorithms to Live By)

This post is split out from my main summary of Algorithms to Live By by Brian Christian and Tom Griffiths. Check out that summary to learn more about the book.

Bayes’s Rule is about estimating the chance of success by combining pre-existing beliefs (or priors) with new/observed evidence.

We can make reasonable estimates with just a single datapoint

Simon LaPlace came up with a formula to estimate the chance of something succeeding based on its history. It’s called LaPlace’s Law, and it works even with just a single datapoint:

(w+1)/(n+2)

Where:

w = number of successes
n = number of attempts

For example, if you succeed on your first try, the estimated chance of success is 2/3, which is more reasonable than assuming you’ll always succeed based on your single success. Whereas if you succeed in 3/3 attempts, LaPlace’s law upgrades the chance of winning to 4/5.

Different types of distributions

Things in life often follow one of these three distributions:

Normal distribution. The longer something has gone on, the sooner we should expect it to end (e.g. human lifespans).
Power-law distribution. The longer something has gone on, the longer you can expect it to continue (e.g. company lifespans).
Erlang distribution. These have a wing-like shape. How long something has gone on says nothing about how much longer you should expect it to continue (e.g. time between successive calls on a phone network).

Use different rules for different distributions

The rule you should use to form your prediction depends on the distribution you expect:

Distribution	Rule	How to Use It	Example
Normal Distribution	Average Rule	Use the average as your guide.	Assume the average human lifespan is 70 years. The Average Rule would predict that a 90-year-old will live to 94 years — because they’ve already exceeded the average lifespan of 70 years, the prediction is adjusted up. (By contrast, the Copernican Principle would predict that a 90-year-old man would live to 180.)
Power-law Distribution	Multiplicative Rule	Multiply the duration observed by a constant factor. For an uninformed prior, the constant is 2 (this gives rise to the Copernican Principle).	Say you came across the Berlin Wall 8 years into its life. You know wall lifespans follow a power-law distribution but have no other information. Under the Copernican Principle, your best guess for its lifespan is then 8 years x 2 = 16 years.
Erlang Distribution	Additive Rule	Predict that things go on a constant amount longer.	How long it will take for your spouse to leave the house — always predict it will take 5 more minutes.

Predicting Roulette

If your chance of winning roulette followed a:

Normal distribution, you should keep playing after a run of bad luck and quit after a winning streak.
Power-low distribution, you should keep playing when you’re on a winning streak and quit after a losing one.
Erlang distribution, your chance of winning remains the same, regardless of your win/lose streak.

Bayes’s Rule (from Algorithms to Live By)

We can make reasonable estimates with just a single datapoint

Different types of distributions

Use different rules for different distributions

Leave a Reply Cancel reply