The Truth is Not Enough
So, there are a ton of ways that statistics can be misused, but despite all the newfound attention paid to “fake news”, I want to focus on a way I think that true statistics are misused today: when they’re claimed to be evidence, but are in fact, simply not evidence.
Recall, that something is evidence for my theory if (and only if) it’s more likely to happen if my theory is true than if my theory is false. More specifically, the strength of a piece of evidence is directly proportional to how much more likely it is to occur if my theory is true than if my theory is false.
This applies just as well to statistics as anything else. A statistic is only evidence for a theory if it’s more likely when the theory is true. At first glance, it seems obvious that just because a statistic exists, doesn’t make it evidence for your theory. However, what I’m arguing here is that authors routinely cite statistics as evidence for their theories even when they are, in fact, not evidence.
Let me give you some examples. I should note that while most of these examples come from the New York Times, but they’re hardly particularly bad abusers of statistics. I chose them because I feel like this will indicate the problem I’m describing is far-reaching even from high-quality news sources.
[The data I used in this section is out of date and makes the conclusions wrong. In particular, net-immigration is near 0 nowadays, so the wall's effect would be near 0 on net - though, we should probably look at actual numbers of crossings not just net crossings, as crossing in either direction is a benefit. Anyway, I think this section still provides a salient example of my larger point, just a 2010 example.]
The New York Times published an article that said "title": "Trump to Order Mexican Border Wall and Curtail Immigration"
The Government Accountability Office has estimated that it could cost $6.5 million per mile to build a single-layer fence, and an additional $4.2 million per mile for roads and more fencing, according to congressional officials. Those estimates do not include maintenance of the fence along the nearly 2,000-mile border with Mexico. Representative Nancy Pelosi of California, the Democratic leader, said she thought even Republicans might balk at spending what she said could be $14 billion on a wall.
While it’s likely that all these statistics are true, they aren’t evidence for anything. You might think that it’s evidence that the wall is too expensive to be practical, but it really isn’t evidence for anything of the sort. Imagine you had read that the cost per mile was $650,000 or $65 million. Would that change your view of the article? If you’re like me and a literal order of magnitude difference in cost doesn’t make you believe anything different, then it wasn’t evidence in the first place.
Fine, you say, but most articles give context to their numbers. You could, for instance, look at what else could be bought with $15 billion dollars "title": "Instead of a wall, here are some things that could be purchased with $15 billion". That is however, not much better – who cares if you could buy 17 million people iPhone 7s, the exact same argument could be made for defunding literally any department; also, if the number was 1.7 million or 170 million, would that change anything?
What about comparing it to other government agencies’ budgets? Again, this is useless. The question of whether the wall would effectively keep undocumented immigrants depends not on the cost, but on the cost-benefit-ratio. The question isn’t whether the wall “costs too much”, the question is whether it costs too much per undocumented immigrant.
The answer to this question is much less clear. The wall would probably cost between $12 and $15 billion "title": "Here's how much Trump's border wall will cost", with an additional $0.5 billion per year in upkeep "title": "Trump's immigration tab: $166 billion". So, over a decade, the wall would cost us about $18.5 billion. About 208,000 undocumented immigrants illegally cross over the land border each year [more arrive legally and overstay or arrive by other methods] "title": "Mexico–United States border". If the wall stopped all these immigrants, then the cost per immigrant would be about $8,900 over the next decade. Is this cost-effective?
Well, the CBP and currently has a budget of $13.9 billion "title": "FY 2017 U.S. Customs and Border Protection Budget Request: Opening Statement As Prepared", but only $3.8 billion of that is for catching undocumented immigrants "title": "The U.S. Already Spends Billions on Border Security". They caught about 463,000 immigrants crossing the border illegally in 2010 "title": "THE COSTS AND BENEFITS OF BORDER SECURITY" – yielding a cost of about $8,200 per immigrant.
You might say that $8,900 estimate is higher, so the wall is ineffective, but that argument is quite weak considering that the law of diminishing returns implies the marginal cost of stopping an immigrant is higher than the average cost. You could, however, argue that $8,900 is an underestimate, and that, since the wall isn’t 100% effective, the true cost is higher. Given these opposing arguments and the uncertainty of many of these numbers, I don’t think it’s obvious that the wall is significantly less (or more) effective than conventional border control techniques.
Just to be clear, I primarily oppose the wall because I expect it to be reasonably effective. In my opinion, the financial costs of the wall are dwarfed by the human suffering that it would create.
But let’s get back to the topic at hand. The most common numbers quoted about the wall are the costs and the large number of immigrants it wouldn’t stop – mainly those who overstay their visas. Neither of these answer the question of whether the wall is cost-effective. It turns out that merely using true statistics isn’t enough to make good arguments, they must be the right statistics.
Here are some other examples.
- This article says that a gun-control group would spend $25 million during the election "title": "Gun-Control Groups Push Growing Evidence That Laws Reduce Violence". What they didn’t do is divide this by total political spending to indicate how influential the group is.
- A third article writes that 8,124 Americans were killed with guns in 2014 "title": "Compare These Gun Death Rates: The U.S. Is in a Different World", which is largely irrelevant regarding whether we should have stricter gun control. To answer this question, we actually need to know is how many fewer deaths a particular piece of legislation is estimated to prevent compared to the number of people who could no longer own guns. This ratio is what we should be talking about. [This same article later makes a correlation implies causation argument, but I digress…] Alternatively, we can do what I’ve done and try to compute the externality per gun.
Why This Matters
A reasonable criticism of this post is that these statistics aren’t the whole article, and that the question is whether the statistics were used ideally, but whether they improved the article. Isn’t it better to use statistics somewhat tangentially that to not use them at all? I disagree with that idea. I believe that these tangential uses of statistics are nearly as bad as using false statistics. In both cases, these numbers
- make people feel the article is more object than it really is, and therefore makes them more certain of its conclusions than they should be
- make people distrust statistics in general, because they start to realize that someone can find a statistic to support anything
The second point bears elaborating on. People are right to generally distrust statistics. Recall that something is evidence for my theory if (and only if) it’s more likely to happen if my theory is true than if my theory is false. Using this definition, statistics are generally not evidence, because I can find statistics consistent with my narrative almost regardless of what my narrative is.
However, this, in my opinion, causes people to throw the baby out with the bathwater. As a reasonably quantitate person can discern relevant from irrelevant statistics, they should be able to use the former as extremely useful evidence, while discarding the latter as irrelevant. Readers should be more discerning in sorting useful statistics from useless ones, even while writers should be more discerning over what statistics they use in the first place. And, this moral doesn’t just apply to statistics. It’s easy to give arguments that sound strong but aren’t actually strong.
Statistics can be extremely useful in shedding light on an issue, and can often cut to the heart of the matter in a falsifiable way – something that’s usually difficult using purely qualitative analysis. But they’re useful in the same way a knife is: invaluable when used correctly, but harmful when misused - whether intentionally or not.