# Is Bayes’ Theorem really that interesting?

I have trouble understanding the massive importance that is afforded to Bayes’ theorem in undergraduate courses in probability and popular science.

From the purely mathematical point of view, I think it would be uncontroversial to say that Bayes’ theorem does not amount to a particularly sophisticated result. Indeed, the relation
$$P(A|B)=P(A∩B)P(B)=P(B∩A)P(A)P(B)P(A)=P(B|A)P(A)P(B)P(A|B)=\frac{P(A\cap B)}{P(B)}=\frac{P(B\cap A)P(A)}{P(B)P(A)}=\frac{P(B|A)P(A)}{P(B)}$$
is a one line proof that follows from expanding both sides directly from the definition of conditional probability. Thus, I expect that what people find interesting about Bayes’ theorem has to do with its practical applications or implications. However, even in those cases I find the typical examples being used as a justification of this to be a bit artificial.

To illustrate this, the classical application of Bayes’ theorem usually goes something like this: Suppose that

1. 1% of women have breast cancer;
2. 80% of mammograms are positive when breast cancer is present; and
3. 10% of mammograms are positive when breast cancer is not present.

If a woman has a positive mammogram, then what is the probability that she has breast cancer?

I understand that Bayes’ theorem allows to compute the desired probability with the given information, and that this probability is counterintuitively low. However, I can’t help but feel that the premise of this question is wholly artificial. The only reason why we need to use Bayes’ theorem here is that the full information with which the other probabilities (i.e., 1% have cancer, 80% true positive, etc.) have been computed is not provided to us. If we have access to the sample data with which these probabilities were computed, then we can directly find
$$P(cancer|positive test)=number of women with cancer and positive testnumber of women with positive test.P(\text{cancer}|\text{positive test})=\frac{\text{number of women with cancer and positive test}}{\text{number of women with positive test}}.$$
In mathematical terms, if you know how to compute $$P(B|A)P(B|A)$$, $$P(A)P(A)$$, and $$P(B)P(B)$$, then this means that you know how to compute $$P(A∩B)P(A\cap B)$$ and $$P(B)P(B)$$, in which case you already have your answer.

From the above arguments, it seems to me that Bayes’ theorem is essentially only useful for the following reasons:

1. In an adversarial context, i.e., someone who has access to the data only tells you about $$P(B|A)P(B|A)$$ when $$P(A|B)P(A|B)$$ is actually the quantity that is relevant to your interests, hoping that you will get confused and will not notice.
2. An opportunity to dispel the confusion between $$P(A|B)P(A|B)$$ and $$P(B|A)P(B|A)$$ with concrete examples, and to explain that these are very different when the ratio between $$P(A)P(A)$$ and $$P(B)P(B)$$ deviates significantly from one.

Am I missing something big about the usefulness of Bayes’ theorem? In light of point 2., especially, I don’t understand why Bayes’ theorem stands out so much compared to, say, the Borel-Kolmogorov paradox, or the “paradox” that $$P[X=x]=0P[X=x]=0$$ when $$XX$$ is a continuous random variable, etc.

What is the probability that there was life on Mars two billion years ago? What does that question mean? It has no answer according to the frequentist interpretation. “The probability of life on Mars two billion years ago is $$0.540.54$$” is taken to be meaningless because one cannot say it happened in $$54%54\%$$ of all instances. But the Bayesian, as opposed to frequentist, interpretation of probability works with this sort of thing.