This post sets out some ways to reduce noise and common objections to reducing noise.
I have split this out from the main summary of Noise by Kahneman, Sibony, and Sunstein because it’s quite lengthy and that summary was already getting very long.
Buy Noise at: Amazon | Kobo (affiliate links)
Kahneman, Sibony and Sunstein accept that you don’t always want to eliminate noise because:
- the cost of doing so may outweigh the benefits; and
- a method that eliminates noise may compromise important competing values. But there are multiple methods to reduce noise so you may be able to reduce, not eliminate, noise while preserving those values by choosing an appropriate method.
- How to Reduce Noise
- Objections to Reducing Noise
- 1. Cost
- 2. Some methods may produce errors
- 3. People being judged may lose a feeling of "being heard" and treated as an individual
- 4. Some methods may freeze in place existing values
- 5. Some methods may allow people to game the system
- 6. A noisy process might be a good deterrent
- 7. People doing the judging don't want to feel like a cog in a machine
- 8. Poor execution of noise reduction methods
How to Reduce Noise
The book Noise sets out two types of methods to reduce noise. The first type of methods reduce cognitive biases. They can be used in one-off cases where you are less concerned about “noise” and just want to make a better decision, period. I have put these in a separate note about decision hygiene.
The second type of methods are the ones that I think are truly about reducing system noise. Those are:
- Averaging independent judgments
- Using a common scale and anchor or ranking
- Using guidelines or rules
Averaging independent judgments (wisdom of crowds)
As a matter of simply math, averaging independent judgments will reduce noise (but not bias). Averaging x judgments will divide noise by x^(1/2). So, averaging 100 judgments for example will reduce noise by 90% (to 10%). Averaging 400 judgments will reduce noise by 95% (to 5%).
Prediction markets are an example of this. They have been found to do very well in predicting outcomes.
If you don’t have access to lots of independent judgments, aggregating judgments from the same person can still improve accuracy. Psychologists Vul and Pashler call this “the crowd within”. The judgments have to be obtained at different times, so the person doesn’t just follow their earlier judgment. But the effect is only about 1/10th the improvement as getting an independent judgment from someone else. We are more similar ourselves at different times than we are to other people at any time.
Another way to aggregate judgments if you don’t have access to lots of independent judgments is to encourage people to generate a second judgment that is as different as possible from the first. The second judgment still has to be plausible. Then you average the two judgments. Herzog and Hertwig call this “dialectical bootstrapping”. The found it improved accuracy more than just asking for a second judgment.
Diversity helps. If you are averaging the results of two different models, averaging the best model and the second-best model may not be that helpful if the two models are closely correlated. You may get a more accurate result by averaging the best model and the third-best model, if these two models are less correlated. Similarly, let’s say you are adding someone to a team of judges and you will be averaging their judgments. It may be better to pick a judge with skills the rest of your team doesn’t have, even if their track record is slightly worse than a judge with skills similar to those already on your team. Paradoxically, the noisy group’s average is usually more accurate than the unanimous group’s.
The Delphi method and the mini-Delphi
The Delphi method is one way to aggregate diverse views. In its classic form there are multiple rounds were participants submit anonymous estimates to a moderator. In each new round, the participants provide reasons for their estimates and respond to reasons given by others, still anonymously. The process encourages estimates to converge. You can also force convergence by requiring new estimates to fall within a specific range of the earlier round’s estimates.
The mini-Delphi is a modified method that you can use in a single meeting. It’s also called estimate-talk-estimate. First, participants produce separate (and silent) estimates. Next, they explain and justify their estimates. Lastly, participants make new estimates having heard others’ estimates and justifications. You then average the second estimates to obtain a consensus.
Using a common scale and giving a common anchor or ranking
Our ability to compare and rank cases is much better than our ability to place cases on a scale. People tend to disagree less when asked to rank cases than when they are asked to place them on a scale.
Part of the reason for this is because the labels assigned to scales (e.g. unlikely, real possibility, good) are crude. People have different interpretations of what the labels mean, even if they have the same underlying judgment. In other words, when people have to place cases on a scale, they have to do two things:
- First, they have to figure out what the labels on the scales mean.
- Second, they have to figure out where to put the case on the scale.
So even if they agree on this second point, decisions may be noisy if they disagree on the first.
The authors of Noise found that, in a punitive damages study, asking people to rank cases rather than to transform them into dollar values reduced noise in the judgments from 94% to 49%. This is because when deciding the dollar value to assign to a case, people select an initial anchor arbitrarily. They will, however, base future cases on that initial anchor – so they are consistent among their own judgments. People seem to be much more sensitive to relative values than to absolute values.
If you’re designing a scale, try not to have more than 7 points on it. This is because 7 is the maximum number of categories we can generally distinguish on an intensity scale. Beyond this, people start to make errors, though it is possible to train people to make finer distinctions with clear hierarchical categories.
Using guidelines or rules
Guidelines reduce bias as well as noise. They do not eliminate the need for judgment.
Rules do eliminate the need for judgment (within the rule’s domain). The authors of Noise caution that, when using firm rules, you should be alert to the possibility that rules drive discretion underground if the people applying the rules don’t buy into it. For example, the three-strikes law is a rule that forces judges to impose the maximum sentence on someone when it is their third offence, regardless of any mitigating circumstances. In practice, prosecutors and police often choose not to charge people they might have otherwise charged so that judges won’t have to impose a disproportionate penalty.
Examples of guidelines include:
- Apgar scores in medicine to assess whether a newborn baby is in distress. The Apgar score decomposes the assessment into several different factors (e.g. is the pulse above or below 100bpm; is the baby breathing freely, with difficulty, or not at all). Each factor is straightforward to assess even for practitioners with modest training. The Apgar score also specifies how to weight the predictors to produce the overall judgment.
- Criminal sentencing. Higher courts will sometimes issue “guideline judgments” for certain offences (e.g. rape, cannabis cultivation). Such judgments provide guidance for what the sentence in a particular case should be, by reference to aggravating and mitigating factors that they expect to find in cases. For example, a Level 1 cannabis cultivation case may involve small quantities grown for personal use, no commerciality and an unsophisticated operation. A Level 2 case may involve larger quantities and some commerciality. A Level 3 case may be a highly sophisticated operation involving large quantities and a lot of money.
Objections to Reducing Noise
Kahneman, Sibony and Sunstein set out 7 major objections to efforts to reduce or eliminate noise. I have suggested an 8th reason below.
In general, the authors accept that some of these objections may be valid against a noise reduction method in particular case, but argue that the answer isn’t just to give up trying to reduce noise. Instead, you can try to use a different method that is more appropriate in the circumstances.
1. Cost
Reducing noise can be expensive. It may not be worth the trouble, or feasible. in some cases. For example, with middle school teachers grading homework, the stakes are generally too low to bother reducing noise.
The authors accept this can be a valid objection but says that you should do a cost/benefit analysis first. Often people don’t even know what the costs of noise are, so should do noise audit to work that out.
[I think some of the decision hygiene suggestions for meetings are costly in that they require senior people to do work and come to a view before the meeting. At least at my work, senior people like to learn about the situation and come to a view (preferably agreement) at the same meeting. So, at least for less important decisions, that is costly for senior people whereas the costs associated with lower-quality decisions tend to be externalised (i.e. not borne by the senior people making them).]
2. Some methods may produce errors
A noise reduction method may produce unacceptably high error if it’s too blunt or fails to take into account a significant variable. For example, you could reduce noise if everyone always predicted the same thing, but everyone could be wrong.
Another example is the three-strikes legislation. The authors say that the central point of the three-strikes legislation is to eliminate noise. [I disagree. I think imposing tougher sentences in general was the key point. The three-strikes law often results in longer sentences than even the harshest judge would give.]
The authors suggest that the best response to a crude noise reduction method is to come up with a less-crude method (or implementation of the method) that takes into account more variables.
3. People being judged may lose a feeling of “being heard” and treated as an individual
Noise can be a by-product of an imperfect process that people embrace because it gives them an individualised hearing and opportunity to influence the exercise of discretion.
[I do wonder just how valuable this feeling of “being heard” is. Would some people being judged prefer a less noisy system? For example, if we gave defendants a choice (ex-ante) of being judged by a human judge vs a model or algorithm, which would they pick?]
4. Some methods may freeze in place existing values
A method to reduce noise may reduce our ability to respond when circumstances change and “freeze” existing values. Flexibility and discretion ensures that as new beliefs and values rise, they can change policies and decisions over time.
The authors point out that some noise reduction methods don’t have this problem at all. For example, getting people to use a shared scale or averaging independent judgments won’t freeze existing values. They also point out that in the case of rules or guideline, you could build in an annual or regular review to revisit it over time .
5. Some methods may allow people to game the system
If people know exactly how a decision will be made, they may be able to work around it in their favour. The authors point to tax law as an example. The tax system is meant to be clear and predictable, so it uses lots of rules. But the problem is that with clear bright-lines (he uses the term “edges”), people can plan to get around it.
The answer to this is to compare the amount of harm caused by people gaming the rules with the amount of harm caused by noise. If the harm caused by gaming is relatively low, it may still be better to use the noise-reduction method.
6. A noisy process might be a good deterrent
If people know they could be subject to a large penalty or a small one, they may steer clear of wrongdoing if they are risk averse. The authors find this unpersuasive. They suggest increasing the penalty and eliminating the noise if you want deterrence.
[I also find this unpersuasive. In a noisy system, a person could just as easily commit a crime in the hopes of getting the lighter penalty. This seems more likely than the reverse as unusually light penalties seem to get more media coverage than unusually heavy ones. Besides, criminal penalties are generally not very successful deterrents for anything other than “rational” crimes anyway.]
7. People doing the judging don’t want to feel like a cog in a machine
The authors point out that when people feel they have reached the right judgment, they get an internal signal of judgment completion which feels rewarding. (Some) noise reduction methods (e.g. rules or formulas) may take away this rewarding signal so they can be very resistant to such methods.
The authors accept that people may feel demoralised if their ability to make judgments is taken away or greatly circumscribed. It is easy to feel like an interchangeable cog if all you are doing is very simple assessments.
Their suggestions are:
- use the noise reduction methods that still allow for judgment – e.g. structuring complex judgments, aggregating judgments.
- if using rules or strict guidelines, maybe allow a process for people to challenge the rules or guidelines if they don’t think they are working well.
8. Poor execution of noise reduction methods
I have thought of another reason, not recognised by Kahneman, Sibony and Sunstein, for why people have problems with things like guidelines or overly structured decision frameworks. The reason is because in large organisations, guidelines or frameworks are often poorly designed. People may not object to the goal of reducing noise per se, they may just object to a particular method or how that method is executed in practice.
For example, my company may have a framework in place for conducting behavioural interviews with examples of how to score different answers. But that framework may have been designed by an HR person, who has little understanding of what the actual role involves and how to hire for it. This is particularly common in technical roles such as programming or law.
Buy Noise at: Amazon | Kobo <– These are affiliate links, which means I may earn a small commission if you buy through these links. I’d be grateful if you considered supporting the site in this way! 🙂
You may also want to check out: