This post is split out from my main summary of Algorithms to Live By by Brian Christian and Tom Griffiths. Check out that summary to learn more about the book.
Bayes’s Rule is about estimating the chance of success by combining pre-existing beliefs (or priors) with new/observed evidence.
We can make reasonable estimates with just a single datapoint
Simon LaPlace came up with a formula to estimate the chance of something succeeding based on its history. It’s called LaPlace’s Law, and it works even with just a single datapoint:
(w+1)/(n+2)
Where:
- w = number of successes
- n = number of attempts
For example, if you succeed on your first try, the estimated chance of success is 2/3, which is more reasonable than assuming you’ll always succeed based on your single success. Whereas if you succeed in 3/3 attempts, LaPlace’s law upgrades the chance of winning to 4/5.
Different types of distributions
Things in life often follow one of these three distributions:
- Normal distribution. The longer something has gone on, the sooner we should expect it to end (e.g. human lifespans).
- Power-law distribution. The longer something has gone on, the longer you can expect it to continue (e.g. company lifespans).
- Erlang distribution. These have a wing-like shape. How long something has gone on says nothing about how much longer you should expect it to continue (e.g. time between successive calls on a phone network).
Use different rules for different distributions
The rule you should use to form your prediction depends on the distribution you expect:
Distribution | Rule | How to Use It | Example |
---|---|---|---|
Normal Distribution | Average Rule | Use the average as your guide. | Assume the average human lifespan is 70 years. The Average Rule would predict that a 90-year-old will live to 94 years — because they’ve already exceeded the average lifespan of 70 years, the prediction is adjusted up. (By contrast, the Copernican Principle would predict that a 90-year-old man would live to 180.) |
Power-law Distribution | Multiplicative Rule | Multiply the duration observed by a constant factor. For an uninformed prior, the constant is 2 (this gives rise to the Copernican Principle). | Say you came across the Berlin Wall 8 years into its life. You know wall lifespans follow a power-law distribution but have no other information. Under the Copernican Principle, your best guess for its lifespan is then 8 years x 2 = 16 years. |
Erlang Distribution | Additive Rule | Predict that things go on a constant amount longer. | How long it will take for your spouse to leave the house — always predict it will take 5 more minutes. |
Predicting Roulette
If your chance of winning roulette followed a:
- Normal distribution, you should keep playing after a run of bad luck and quit after a winning streak.
- Power-low distribution, you should keep playing when you’re on a winning streak and quit after a losing one.
- Erlang distribution, your chance of winning remains the same, regardless of your win/lose streak.