Book Summary: Calling Bullshit by Bergstrom and West

This summary of Calling Bullshit explains how to deal with all the bullshit we see in the world. Bergstrom and West have also developed a college course on Calling Bullshit, available on YouTube.

Buy Calling Bullshit at: Amazon | Kobo (affiliate links)

Table Of Contents

Key Takeaways from Calling Bullshit
Detailed Summary of Calling Bullshit
- General stuff about bullshit and how it works
- Spotting bullshit
Other Interesting Points
My Thoughts

Key Takeaways from Calling Bullshit

There is a lot of bullshit out there, probably more than you think. In large part this because it takes a lot more work to refute bullshit than to create it (Brandolini’s principle). The Internet doesn’t help either, as it incentivises clicks and attention, rather than accuracy. Bullshit is usually more salacious than the truth.

Bergstrom and West describe how to spot some of the most common ways to spot bullshit. You may not be able to understand the “black box” part of a particular study, but you can often still detect bullshit by scrutinising the data used (e.g. is it unbiased?) and the interpretation of the results (is the conclusion fair to draw given the results?) without touching the black box at all.

The book spends a lot of time on common ways in which people may be misled, with many examples. It discusses selection bias, misleading graphs, prosecutor’s fallacy and more. It is worth a read for anyone who cares to see past misinformation and get a better sense of the truth.

Detailed Summary of Calling Bullshit

General stuff about bullshit and how it works

Paltering and implicature

Paltering is saying things that are technically true but which are likely to mislead.
Implicature (coined by H P Grice) is what a sentence is being used to mean, rather than what it literally means. Implicature is what lets us palter. It is part of the way that we use language – people don’t usually say exactly what they mean because it is inefficient to do so; context usually makes clear what the person means. For example, if I say “I want a coffee” and you say “There’s a diner up the road”, the implication is that I can get coffee at the diner.

This is a common type of bullshit – it is safer and offers a level of plausible deniability. It allows advertisers to make claims without taking responsibility if the claimed benefits do not arise. Lawyers can use implicature to make false claims without getting disbarred if found out. [I think it also allows what Dan Ariely has described as the “fudge factor”, which is basically how much you can cheat while still feeling good about yourself.]

The Internet enables bullshit

The Internet enables bullshit in several ways:

Lowering the cost of publishing (for creators of bullshit). The fall in the cost of publishing led to changes in the types of things that were published and how people interacted with it.
Monetary incentives for clicks and views, and sensationalism (for creators of bullshit). This is reinforced by algorithms that promote things that are more widely shared/interacted with. For example, the most successful Facebook headlines contain the phrase “will make you” (e.g. “will make you gasp in surprise”, “will make you cry”). They tell you how you’ll feel about the article but not what it’s about, otherwise you won’t need to click on it. Top X lists are also popular, because you can put each item on a separate page, resulting in X clicks rather than just one.
Incentives for sharing (for propagators of bullshit). People often share things to communicate what sort of person they are and what they value. They don’t care as much about the accuracy of what they send. They have little incentive to share a retraction or correction to a story they shared earlier. Doing so would just make them look bad with no upside.
Social capital of sharing (for consumers of bullshit). People are more likely to believe something if it comes from someone they know, compared to an unknown source. In this way, a sharer’s social capital can be used to give credibility to someone else’s disinformation.

Brandolini’s principle (bullshit asymmetry)

Alberto Brandolini, an Italian software engineer, stated:

The amount of energy needed to refute bullshit is an order of magnitude bigger than [that needed] to produce it.
—Alberto Brandolini, 2013

A few years earlier, Italian blogger Uriel Fanelli had similarly noted that an idiot can create more bullshit than you could ever hope to refute.

An example of this principle was Andrew Wakefield’s false “vaccines cause autism” claim . Even though the original study has been thoroughly and repeatedly debunked, the claim is still repeated today.

Spotting bullshit

Latour talks about scientific claims that rely on “black boxes” which are difficult for the reader to understand and dissect. In machine learning, sometimes even the person who created the algorithm may not understand the black box.

The authors’ point is that to spot bullshit, you often don’t need to understand the black box. Instead, you can just focus on what goes in and out of the black box. Doing this is within most readers’ abilities. This is probably the single most important point made in Calling Bullshit.

What’s going into the black box?

If the information being put into the black box is biased, the results will be biased. The authors talk at length about selection bias and its various forms. Selection bias means the sample population being studied is not representative of the general population.

Example – criminal detector

In 2016, Wu and Zhang created a computer program that they claimed could distinguish criminals from non-criminals based on their facial features.

But the massive flaw was that the training data for the program used official IDs (e.g. passport photos, driver licences) for criminals and self-selected photos from social media and websites (e.g. LinkedIn, professional firm sites). People tend to smile on social media photos moreso than on official IDs, so the program ended up being a smile detector rather than a criminal detector!

(Another flaw was that, even if successful, this program could only ever detect criminals that got convicted. Attractive, white people are less likely to get convicted than unattractive, non-white people, so the program is also likely to reflect this bias.)

The authors also talk about the use of training data in machine learning algorithms. Good training data is hard to get. Training data from the real world will be messy and have irrelevant noise. The more variables you need, the more training data you need (this is the curse of dimensionality).

A computer may “overfit” training data if it thinks the noise is relevant and uses it to make classifications. This can help the computer make more accurate classifications when dealing training data. But it makes classifications worse when it has to deal with test data (which may not have the same types of noise).

Example – wolves vs huskies

Guestrin and his colleagues designed an algorithm designed to distinguish wolves from huskies. When the scientists looked at how the computer made its decisions, they found out it mostly focused on the background!

Wolves were most often shown in backgrounds with snow. So the computer classified huskies as wolves if they were shown in the snow.

Example – portable x-rays

Zech and colleagues used algorithms to detect pneumonia using X-rays. The machine performed well on the training data but not on test data at other hospitals.

It turned out the machine used the word “PORTABLE” in the X-rays to figure out if the patient had pneumonia. Portable X-rays are used for the most severely ill patients, who cannot get to the radiology department. So the algorithm performed well on the test data, but was useless in the real world.

What’s coming out of the black box?

Causality

One of the most common examples of bullshit is people suggesting there is causality when there isn’t.

Correlations can generate hypotheses about causality

Causality does imply correlation, so you can use a correlation to generate a hypothesis about causation. But a correlation itself is not evidence of causation.

You can then test your hypothesis using experiments. Randomised controlled trials are ideal, but not always possible due to ethical or practical concerns.

Correlations are often misrepresented because people care about causation

Correlations do not generate clicks, while causation does. People want to know – particularly in areas of health, exercise, etc – what they should do.

One study (unfortunately no reference was given for this in the book as far as I could tell) looked at the 50 research studies most commonly shared on social media about diet, pollution, exercise, etc. Only 15 of those studies (30%) did a decent job at demonstrating causality. Only 2 of those met the highest standards for doing so. The rest only demonstrated correlations. But in the medical journals, one-third of those studies suggested causation without adequate evidence. In the popular press, nearly half of news articles describing the studies claimed causation without a proper basis for it.

Spurious correlations may exist by chance

Many correlations exist that are entirely spurious. Tyler Vigen collected a large number of datasets about how things change over time, and then used a computer program to compare trends. Many of these will show correlations just by chance. This is called data dredging.

For example, the age of Miss America is tightly correlated to the number of people murdered with steam and hot objects. This is true for a period, anyway (between 1999 and 2008). But since this is obviously a spurious correlation, the trend disappears entirely after 2008.

[This is related to the issues of p-hacking and conditional probabilities described below – while the chances of a tight correlation in complex trends is unlikely to exist by chance, this is not true if you compare many different datasets to look for these trends.]

One way to get spurious correlations in trends over time is to compare simple increasing/decreasing trends with other increasing/decreasing trends. For example, the number of brooding storks over time has declined, and so has the number of newborn babies.

With computers having access to lots of data that they can process and compare, the risk of spurious correlations increases.

Probabilistic, sufficient and necessary causes

Probabilistic cause – if A happens, the chances of B happening increase.
Sufficient cause – if A happens, B must happen.
Necessary cause – unless A happens, B can’t happen.

The distinction between these three types of causes can be misused, particularly if trying to deny a causal relationship. For example, Mike Pence claimed that smoking doesn’t kill because 2 out of 3 smokers die from non-smoking related causes and 9 out of 10 smokers don’t get lung cancer.

But obviously, just because you smoke doesn’t mean you will die from smoking. You can of course still die from car crashes. Smoking greatly increases the chance of dying from a smoking-related illness, but it doesn’t guarantee it. (Pence is also factually incorrect as around 2/3 of smokers do die from smoking-related causes.)

Consistency is not causality

Just because some results are consistent with your/the researcher’s hypothesis, it doesn’t necessarily prove that hypothesis. There may well be other reasons or hypotheses that are consistent with the results. For example, the gaydar study.

Interpreting Data

Numbers can suggest precision and a degree of scientific rigour. But this is not always warranted. Numbers should be presented in ways that allow us to make meaningful comparisons. But numbers can be used to mislead, and invite us to make unfair comparisons.

Numbers are particularly prone to mislead by being taken out of context. Any original caveats disappear as people just regurgitate the number without the context or how it was derived.

Example – 50% of scientific articles are never read by anyone

There is an oft-quoted statistic that 50% of scientific articles are never read by anyone.

Arthur Jago hunted down the source of that statistic. He found that the original claim was just that 50% of papers are not cited, rather than not read. Moreover, 50% was the percentage of papers that went uncited after four years. It also only looked at citations in a database that covered a subset of journals, but the papers could have been cited in other forums, such as in books or conferences. And lastly, the original 50% claim included all types of articles, including letters to the editor, book reviews and obituaries. It is not surprising that most book reviews and obituaries go uncited.

One of the most interesting parts of Calling Bullshit was the explanation of the Prosecutors Fallacy and Base Rate Fallacy. The authors also suggest using Fermi estimates as a quick way to spot bullshit numbers.

There is also frequent reference to Goodhart’s law, which says (rephrased more eloquently by Marilyn Streathern):

When a measure becomes a target, it ceases to be a good measure.
— Marilyn Streathern, as cited in Calling Bullshit

Summary statistics

Summary statistics condense a range of values (e.g. heights, incomes) and summarise it. This can be misleading because the statistic used may be very different from what the reader assumes it will mean (e.g. mean may be very different from a median or mode due to a skewed distribution).

Percentages

Percentages can help us make meaningful comparisons. But:

they can also be used to make large values look small
there is a difference between percentages and percentage points that can be confusing (e.g. an increase of inflation from 2% to 3% is a 50% increase or a 1 percentage point increase)
they can obscure important changes in net values if the denominator changes as well as the numerator
they can give strange answers if they summarise a net increase or decrease, but not all the “components” of that percentage went in the same direction.

Example – misleading percentages

Governor Scott Walker claimed in 2011 that 50% of the nation’s job growth had occurred in his state of Wisconsin. But actually, some US states had lost jobs while others had gained. The net change was only about 18,000 jobs. Wisconsin had a net job growth of 9,500. So it looked like 50% of the “net” increase in the country but it was a much, much, smaller percentage of the jobs added in the country.

Interpreting Graphs and other Visualisations

As a convention, the horizontal axis is used for the independent variable (the one that causes or influences the other variable) and the vertical axis is used for the dependent variable. This suggests causation, even when unwarranted.

Vertical axes

In a bar graph, the vertical axis should go to zero because the bar graph emphasises the absolute magnitude of values. The reason is because of the principle of proportional ink:

When a shaded region is used to represent a numerical value, the size (i.e. area) of that shaded region should be directly proportional to the corresponding value.
— The principle of proportional ink, from Calling Bullshit

In a line graph, the vertical axis does not need to go to zero (and sometimes should not) because it emphasises the change in values and the absolute magnitude is usually irrelevant. Because a line chart uses positions rather than shaded areas to represent quantities, the principle of proportional ink does not apply (unless the line chart is “filled in”).

Example: “The only global warming chart you need from now on”

Steven Hayward created a line chart with vertical axis at zero degrees Fahrenheit to show that the average annual global temperature had only increased by a few degrees over a 140 year period.

This was misleading because the absolute temperature is irrelevant (and the fact that they used Fahrenheit instead of Celsius was completely arbitrary too).

Be very careful if a graph uses two different vertical axes. By changing the scale of those axes, a designer can tell almost any story they want.

Horizontal axes

The key ways to mislead with horizontal axes are:

to pick date (or x-axis) ranges that obscure part of the story
use uneven or varying scales on the axis
“binning” or grouping data together (commonly used in bar charts) where the size of the bins is uneven (Ken Schultz has an example of this with tax rates)

Other visualisations

Some other visualisations that the authors object to are:

ducks (based on an impractical duck-shaped building in Flanders) – graphs that focus on the graphics at the expense of the data.
glass slippers – graphs that take one type of data and shoehorn it into a visual form designed for another (e.g. subway map of Rock n Roll, periodic tables of cloud computing). [I think these are gimmicky, but not really designed to be misleading since most people know not to take it too seriously.]

3D data graphics

3D data graphics may legitimately be used if there are pairs of independent variables. They should not be used to represent data with only one independent variable – often they will be ducks or glass slippers.

The use of perspective makes it a lot harder for the reader to assess the relative sizes of the chart elements (violating the principle of proportional ink).

Bullshit articles

The authors explain the Replication Crisis, which seems to be caused by a combination of selection bias and misuse of p-values.

Non-academic articles (e.g. newspapers, pop science, etc) often contribute to a misunderstanding about the significance of a single study for science. A single study, even if published in a prestigious, peer-reviewed journal, can never really be definitive – it just shows some support for a particular hypothesis. Scientists build on each other’s studies, so lots of different studies, each looking at an issue from a different angle, are needed before a scientific consensus can emerge. But the popular press never describe the results of a single study with these sorts of caveats (and sometimes the researchers don’t either).

You can never know that a particular scientific paper is correct. The best you can do is to work out if it’s legitimate. One quick way to do this is find out what journal it’s published in, and check the prestige of that journal in rankings. An extraordinary claim, if credible, is more likely to be found in a prestigious journal. Smaller, more niche claims may be found in lower-ranked journals but could still be credible.

Check for corrections or retractions on the publisher or author’s website before relying heavily on a paper.

Check yourself for confirmation bias. If you already think an article is true, or likely to be true, you may be tempted not to fact-check it.

Other Interesting Points

A sophisticated bullshitter needs a “theory of mind” – putting themselves in their target’s shoes and understanding what type of bullshit will resonate with them.
Before the printing press, every book had to be written by hand, so only royalty and clergy had the resources to produce a copy of a book. So there were far fewer books available (e.g. the Bible and other “important” books).
Andrew Wakefield’s “MMR vaccine causes autism” study was dodgy for the following reasons:
- The sample size was tiny – it only looked at 12 children
- The case histories described in his paper did not match the case histories for children found in medical records or reports from parents (e.g. three of the patients listed as suffering from autism did not have autism at all, five of the children reported as “normal” before receiving the vaccine actually had prior developmental issues)
- He had enormous financial conflicts of interest that were not disclosed – his work was funded by a lawyer who was involved in a lawsuit against vaccine manufacturers and Wakefield’s work featured heavily in the lawsuit. Wakefield received well over GBP400k for his work on the lawsuit. Wakefield had also filed patent applications test and a competitor to the MMR vaccine that he claimed was “safer”.
- Tons of careful scientific studies in the years following proved the lack of a correlation between vaccines and autism.
In 2016, a website AWD News published a fake story that said the Israeli Defence Minister threatened Pakistan with a nuclear attack. The Pakistan Defence Minister saw this story and believed it, responding with a threat of his own on Twitter.
Propaganda doesn’t necessarily exist to persuade you of a specific view. It may just exist to confuse and disorient, and exhaust your critical thinking abilities, so that you don’t know what to believe.
Some creators of fake news just want to earn money from clicks – they may not care at all what you believe or vote for.
Radar guns detect speeding by emitting radio waves that reflect off a moving vehicle and measuring the Doppler shift in those waves. Radar guns need to be regularly calibrated to work correctly, so a standard method for getting out of a speeding ticket is to challenge the officer to produce timely calibration records.
People posting about their husbands on Facebook tend to post about positive things – if you type in “my husband is …”, the autocomplete suggests things like “my best friend”, “amazing”, “my everything”, etc. People googling their husbands on Google tend to ask about problems – if you type the same phrase into Google, the autocomplete suggests “mean”, “addicted to porn”, “selfish”.
Because of selection bias, all insurance companies can truthfully claim that people who switch to their policy save $xxx on average. Different insurers base their premiums on different things and weight risks differently. The only people who bother making a switch will be those who save a substantial amount.
Bergstrom and West compellingly refute Wang and Kosinski’s alleged “gaydar” machine, which they had argued provided strong support for the prenatal hormone theory (PHT). Bergstrom and West point out:
- The photos used as training data were self-selected photos from internet dating sites.
- There is no evidence that the computer was picking up on differences in face shapes or structures due to prenatal hormone exposure, as opposed to other differences. The differences could have been due to grooming, attire, photo choice, lighting, tattoos, etc. The study didn’t even show a statistically significant difference between face shapes.
- Even if the computer was picking up on differences in face shapes or structures, that’s not proof of the PHT. It’s just consistent with it. But it is also consistent with lots of other theories – e.g. it could be genetic, it could be due to hormone exposure outside the room, it could be that sexual orientation influences diet and exercise, which in turn affects facial shapes and structures.
Fox reported on $70m each year in SNAP benefits lost to fraud, suggesting that it was a very high figure. But compared to amount of SNAP benefits actually given out, that is a very low percentage (around 0.2%) far lower than ordinary retail losses due to theft (around 1-3%). The funny thing is, Fox had gotten the $70m figure wrong and the US Department of Agriculture demanded a correction. The actual figure was around $900m!

My Thoughts

Calling Bullshit is a very good book that is worth recommending to others. The first few chapters were more focused on sociology – why people create bullshit, how it spreads, why it’s a problem etc. These chapters were a bit weaker, repetitive and not as well structured (neither author is a sociologist). The book comes into its own when it talks about “black boxes” and starts to explain how to spot bullshit. This makes sense as one author is a data scientist.

The key takeaway for me personally was the prosecutor’s fallacy and base rate fallacy, because they are so unintuitive. Bergstrom and West were also able to link it to the replication crisis in social science. The issues were explained clearly and convincingly. The book was worth reading for that alone.

The points they made about graphs and the ways in which they can misleading were also useful.

The other ways they described for recognising bullshit were mostly things that I was somewhat aware of anyway, but it was still good to see it all laid out so clearly in a good structure.

There are also lots of engaging and interesting examples used throughout the book to emphasise the authors’ points. For a quicker read, it would have been useful if the examples had been separated from the text in some way (e.g. put in a coloured box, like in a textbook). It seems the authors chose not to do this so it read more like a normal book.

Buy Calling Bullshit at: Amazon | Kobo <– These are affiliate links, which means I may earn a small commission if you make a purchase. I’d be grateful if you considered supporting the site in this way! 🙂

If you enjoyed this summary of Calling Bullshit, you may also like: