EDIT
 Compressing Reality

# Bayes’ Theorem: The Root of All Reasoning

Bayes theorem in particular (and probability theory in general) offers the optimal way to reason under uncertainty. It is the “root of all reasoning” in the sense that an ideal reasoner would always change their beliefs according to these principles.

There are already lots of tutorials on Bayes Theorem on the internet. Some are formal encyclopedic articles "author": "Wolfram Math World", while others are youtube videos "title": "Bayes' Theorem - Explained Like You're Five"; some are brief introductions"title": "An Introduction to Bayes' Theorem", while others contain in-depth, real-world applications "title": "An Intuitive (and Short) Explanation of Bayes’ Theorem".

So why am I writing this? Because, as far as I’ve found, all the other explanations rely on formulas and abstract examples. In this explanation, you’ll never even see the Bayes’ theorem, and all our math will be basic arithmetic. Also, I‘m going to show you how you can use these principles without numbers to improve how you think critically about how the world works.

## Example 1: Coins

To introduce the concept, let‘s start with a simple example. Let’s say that I own two coins. One is a fair coin; the other is a trick coin that has heads on both sides. I randomly choose one coin, flip it, and tell you that it landed heads-up. Assuming (for the sake of the example) that everything I told you is completely true, how likely is it that I chose the trick coin?

There are four steps:

1. Determine the priors.
2. Condition on the theories.
3. Eliminate outcomes based on evidence.
4. Normalize the probabilities, so they add up to 100%.

### Step 1: Priors

Before we do any flipping, there is a 50% chance that I am flipping the normal coin and a 50% chance that I am flipping the trick coin. These probabilities are called the priors - the probability of a theory before looking at the evidence. So, if a rectangle represented all probability space, it’d be split evenly 50-50:

 Fair Coin Trick Coin

### Step 2: Condition

If we condition on (assume) it being the trick coin, then we know that the coin must lands heads-up. On the other hand, if we condition on me having the fair coin, then the coin is just as likely to land tails-up as heads-up. So, we can further divide our probabilities:

 Fair Coin, Heads (25%) Trick Coin, Heads (50%) Fair Coin, Tails (25%)

### Step 3: Eliminate

Okay, finally, let’s use our new information: the coin landed heads-up. That means the yellow area of the table is false:
 Fair Coin, Heads (25%) Trick Coin, Heads (50%)

### Step 4: Normalize

However, we’d like all the probabilities to add up to 100%. To accomplish this we normalize our distribution, which is just a fancy way of saying “multiply it by the right number to make it all add up to 100%.” In our case, multiply everything by 1.333 does this:
 Fair Coin, Heads (33%) Trick Coin, Heads (67%)

And now you can see that the probability I flipped the trick coin is 67%, and the probability I flipped the fair coin was 33%.

I won’t go through all the math again, but if you think about it, you should be able to see that if the coin had landed tails-up, this would make the probability of it being the fair coin 100%, and the probability of it being the trick coin 0% - because, it’s impossible to get tails-up with the trick coin.

And that’s pretty much all there is to Bayesian reasoning. Let’s review the steps:

1. List possible theories and their priors (how likely each theory is).
2. Condition on each theory and compute how likely each outcome is.
3. Eliminate the outcomes that didn’t happen.
4. Normalize to make the probabilities add up to 100%.

## Example 2: God

Now that we’ve seen Bayes’ theorem in theory, let’s apply it in practice to a similar problem. Imagine we’re trying to determine whether God exists by finding out whether sick people who are prayed for get better faster.

### Step 1: Priors

How likely is it that God exists? This is (obviously) a subjective question, and illustrates that probaiblity theory doesn‘t do everything for you. It allows you to take your beliefs and update them in light of new evidence - it does not tell you what to believe to start with. To make the math easier, I’m going to say there’s a 50-50 chance of God existing, but if you want to start with other probabilities, you should be able to follow along with similar reasoning.

 God Does Not Exist (50%) God Exists (50%)

### Step 2: Condition

Now, if you’re an atheist, you’d say the probability of this happening in a study is about 5% "title": "p-value", because it’s possible that, just by random chance, a group of people who are prayed for got better than people who weren’t prayed for. You’d then say that there’s a 95% chance that they don’t get better faster.

 God Does Not Exists; Health Improves (2.5%) God Exists (50%) God Does Not Exist; Health Unchanged (47.5%)

If you’re a theist, you have some wiggle room, as it depends on what exactly you believe. Again, to make the math easier, I’m going to assume that you think its a 50-50 chance of a study finding evidence of God answering prayers. If you want to try this with different probabilities, go for it!

 God Does Not Exists; Health Improves (2.5%) God Exists; Health Improves (25%) God Does Not Exist; Health Unchanged (47.5%) God Exists; Health Unchanged (25%)

### Step 3: Eliminate

Now, if we do this study and we find that prayer does seem to improve people’s recovery rates, the probabilities become

 God Does Not Exists; Health Improves (2.5%) God Exists; Health Improves (25%) God Exists; Health Unchanged (25%)

### Step 4: Normalize

Then, we normalize to get
 God Does Not Exists; Health Improves (4.8%) God Exists; Health Improves (47.6%) God Exists; Health Unchanged (47.6%)

So, we conclude there is a 5% chance of God not existing and a 95% He does exist. While I won’t go through the math again, if the study had found no effect, the probabilities would be a 66% chance of God not existing and a 34% of God existing.

Something to note is that these results don’t seem “fair”. If the study finds prayer is effective, God’s odds of existing jump from 50% all the way up to 95%. If no effect is found, then God’s odds only drop a bit: from 50% to 34%.

There’s a moral to this story: human intuitions about “fair” critical thinking aren‘t always right. I’m an atheist, so I don’t think God exists. However, his not answering prayers is not particularly strong evidence supporting this conclusion.

## Generalizing Probablistic Reasoning

Okay, this is all nifty for math nerds, but why does this matter in real life?

Well, if you deal only with the probabilities 0 and 1, probabilistic reasoning simplifies into first-order logic "title": "Logic and Probability", which is famous for the whole “Socrates is a man; all men are mortal; therefore, Socrates is mortal.”

So, we know that probablistic reasoning can solve literally every problem traditional logic can solve, and many more. So, to the extent that you think logic is useful, probablistic reasoning is at least as useful.

Okay, fine. But, why bother with probablistic reasoning if traditional logic is so much simpler?

The difference is rather straightforward: probablistic reasoning can deal with uncertainty. Indeed, probabilistic reasoning forms the foundation of statistics, which has more-or-less taken over the hard sciences and socials sciences, alike. So, I’d say its practicality in understanding the world is well verified.

Indeed, I think some basic statistical concepts can also help improve your critical thinking above and beyond the field of statistics itself, but let‘s just focus on vanilla probability for now. I think the best way I can help you see how probabilistic reasoning can improve your reasoning is to show you can use it to think critically without using explicit numbers.

## Conditioning: Reasoning Without Numbers

Remember, we looked at 4 steps:

1. List possible theories and their priors (how likely each theory is).
2. Condition on each theory and compute how likely each outcome is.
3. Eliminate the outcomes that didn’t happen.
4. Normalize to make the probabilities add up to 100%.

I don’t really have much to say about steps (3) and (4). There is some nerdy interestingness regarding (1) "title": "Solomonoff's theory of inductive inference", but the main thing to know about choosing your priors is simply Occam’s razor: “Among competing hypothesis, the one with the fewest assumptions should be selected"title": "Occam's razor".

Because of this, I want to focus on (2). I think the idea of conditioning is extremely powerful, because it

1. encourages you to distingish between objective causal relationships and your own values
2. encourages explicitly dealing with your uncertainty, but gradually shifting your beliefs in response to evidence rather than choosing a side that‘s right
3. allows you to not only determine whether something is a valid argument, but how strong that argument is

### A Realistic Example

Let me give you an example. I think any reasonable person would agree that welfare reduces income inequality if you count welfare as income. However, I think some liberals believe that welfare also provides poor households with improved opportunites to increase their economic standing by (e.g.) going back to college, starting a business, or finding a better job.

The first thing we should note, is that the question is whether liberals are right, but to what extent welfare improves the opportunities of poor households. However, we’re going to ignore this detail for now, because once you go down that rabbit hole, you pretty much have to use statistics.

So, let’s try and figure out whether that’s true without explicit probabilities: by conditioning.

Imagine, first, that the liberals are completely correct, then what would we expect? Well, for instance, we’d expect countries similar to the US, but with greater welfare programs would have reduced pre-welfare income inequality, because the welfare gives the poor improved opportunities.

Imagine, now, that the liberals are completely wrong. Then, we’d expect no such difference in pre-welfare income inequality.

Now, unlike, in our previous examples, we can‘t give specific probabilities. It‘s possible that the liberals are correct but that cultural factors eliminate the benefits. It‘s equally likely that the liberals are wrong, but we see differences anyways due to cultural factors.

The next step is to check whether European countries actually do have reduced pre-welfare income inequality.

If there turns out to be no difference, this is evidence for the conservative hypothesis; if there is a significant reduction, this is evidence for the liberal hypothesis.

Of course, the evidence isn’t proof; it could be that other social differences between the US and Europe mess up the numbers. However, your degree of belief should change after the answer is revealed.

We‘ll look more into this issue in another post, but for now, I‘ll just tell you that pre-welfare income inequality is no lower in the US than in European countries "title": "Gini in the bottle". Again, this isn‘t proof, but it is evidence. So if you still believe welfare reduces pre-welfare income inequality, you should have a better reason.

On the other hand, I think you could easily imagine either yourself or someone else simply rejoining that “correlation does not imply causation”. The thing is, it’s true: correlation doesn’t imply causation; it does, however, make it more likely; it is evidence of causation.

This brings us to another important point: the definition of evidence. If I have two theories, then something is evidence for my theory if (and only if) it’s more likely to happen if my theory is true than if my theory is false.

## Fallacies

Finally, probabilistic reasoning provides the main justification for a huge variety of fallacies. To be more precise, most fallacies are just special cases of probabilistic reasoning. Here are some examples:

1. First of all, every Logical Fallacy follow directly from logic, which is just a special case of probabilistic reasoning.
2. The anecdotal fallacy is using a personal experience instead of compelling evidence. This is a falacy because you can find anecdotal evidence for almost any theory, whether or not the theory is true or false. This means that anecdoatal evidence doesn‘t really “cancel out” any probability space, making it not useful for having correct beliefs.
3. The Argument form fallacy is incorrectly reasoning that because an argument for X is false, X must be false. However, you can come up with a bad argument for anything, so (again) this doesn‘t eliminate any probability space, meaning it‘s not useful for having correct beliefs.
4. The Ad hominem fallacy is when you attack your opponenet instead of their arguments. Because you can always attack your opponent, regardless of whether their theory is true or not, this also doesn‘t eliminate any probability space.
I could go on. The point is, fallacies are just rules-of-thumb; if you learn the underlying probablistic reasoning, you don‘t need to memorize dozens of fallacies. Instead of identifying specific problems in arguments, you're be able to use a solid foundation to begin with.

## Limitations

Probabilistic reasoning isn‘t magical, and it has it‘s limitations:

1. Probabilistic reasoning can‘t invent your theories for you - that still requires creativity and an in-depth understanding of an issue.
2. Probabilistic reasoning doesn‘t tell you what your priors are. How likely you think a theory is before you look at the evidence is purely subjective.
3. Probabilistic reasoning is no substitute for scholarship. It doesn‘t give you evidence, it just let‘s you weigh it. You still have to take the time and effort to become well-informed. That‘s what most of this blog tries to accomplish.

All that being said, I hope I‘ve shown you the power of probabilistic reasoning. Although most college students know how to tell if something is evidence or not, I am skeptical that we‘re every really taught how to weight evidence. This, ultimately, is what I hope you can now do.