# Getting better at proofs

So, I don’t like proofs.

To me building a proof feels like constructing a steel trap out of arguments to make true what you’re trying to assert.

Oftentimes the proof in the book is something that I get if I study, but hard to come up with on my own. In other words I can’t make steel traps, but I feel fine buying them from others.

How does one acquire the ability to create steel traps with fluency and ease? Are there any particular reference books that you found helped you really get how to construct a proof fluently? Or is it just practice?

I’d like to second one part of Qiaochu Yuan’s answer: the recommendation to read Polya’s book. Unlike many other books I’ve seen (albeit none of the others recommended above), it actually does contain guidance on how to construct a proof “out of nothing”.

And that’s one problem with the “practise, practise, practise” mantra. Practise what? Where are the lists of similar-but-not-quite-identical things to prove to practise on? I can find lists of integrals to do and lists of matrices to solve, but it’s hard coming up with lists of things to prove.

Of course, practise is correct. But just as with anything else in mathematics, there’s guidelines to help get you started.

The first thing to realise is that reading others proofs is not guaranteed to give you any insight as to how the proof was developed. A proof is meant to convince someone of a result, so a proof points to the theorem (or whatever) and knowing how the proof was constructed does not (or at least, should not) lend any extra weight to our confidence in the theorem. Proofs can be written in this way, and when teaching we should make sure to present some proofs in this way, but to do it every time would be tedious.

So, what are the guidelines for constructing a proof? You’ll probably get different answers from different mathematicians so these should be construed as being my opinion and not a(n attempt at a) definitive answer.

My recommendation is that you take the statement that you want to prove and apply the following steps to it as often as you can:

1. Expand out unfamiliar terms.
2. Replacing generic statements by statements about generic objects.
3. Including implicit information.

Once you’ve done all that, the hope is that the proof will be much clearer.

Here’s an example.

1. Original statement:

The composition of linear transformations is again linear.

2. Replace generic statements:

If $S$ and $T$ are two composable linear transformations then their composition, $S T$, is again linear.

It is important to be precise here. The word “composable” could have been left out, as the statement only makes sense if $S$ and $T$ are composable, but until you are completely familiar with this kind of process, it is better to be overly precise than otherwise. In this case, leaving in the word “composable” reminds us that there is a restriction on the domains and codomains which will be useful later. (However, one has to draw the line somewhere: even the word “composable” is not quite enough since it leaves open the question as to whether it is $S T$ or $T S$!)

3. Include implicit information:

If $S \colon V \to W$ and $T \colon U \to V$ are linear transformations then $S T \colon U \to W$ is again linear.

Here’s where remembering that $S$ and $T$ are composable in the previous step helps keep things clear. As $S$ and $T$ are composable, we only need $3$ vector spaces. Then, since we explicitly have the vector spaces the fact that $S$ and $T$ are composable is plain, though some may prefer to keep that fact in the statement. Also, some may like to have the fact that $U$, $V$, and $W$ are vector spaces explicitly stated.

4. Expand out definitions:

If $S \colon V \to W$ and $T \colon U \to V$ are such that $S(v_1 + \lambda v_2) = S(v_1) + \lambda S(v_2)$ and $T(u_1) + \mu T(u_2)$ for all $v_1, v_2 \in V$, $u_1, u_2 \in U$, and $\lambda, \mu \in \mathbb{R}$, then $S T(x_1 + \eta x_2) = S T(x_1) + \eta S T(x_2)$ for all $x_1, x_2 \in U$ and $\eta \in \mathbb{R}$.

Note that I have been careful not to repeat myself with the newly introduced symbols. It would be technically alright to reuse $u_1$ and $u_2$ in place of $x_1$ and $x_2$ since these are local declarations (restricted by the phrases “for all …”). However, humans are not good at differentiating between local and global declarations so it is best not to reuse symbols unless the scope is very clear.

5. Replace generic statements:

If $S \colon V \to W$ and $T \colon U \to V$ are such that $S(v_1 + \lambda v_2) = S(v_1) + \lambda S(v_2)$ and $T(u_1) + \mu T(u_2)$ for all $v_1, v_2 \in V$, $u_1, u_2 \in U$, and $\lambda, \mu \in \mathbb{R}$, then whenever $x_1, x_2 \in U$ and $\eta \in \mathbb{R}$, $S T(x_1 + \eta x_2) = S T(x_1) + \eta S T(x_2)$.

Up to now, the rephrasing has not taken into account the fact that there is a conclusion and a hypothesis. This rephrasing modifies a part of the conclusion to turn it from a generic statement “$P(p)$ is true for all $p \in Q$” to a statement about a generic object “whenever $p \in Q$ then $P(p)$ is true”. We do not do this for the similar statements in the hypothesis. This is because these two pieces are treated differently in the proof.

6. Replace generic statements, and reorganise to bring choices to the fore:

Let $S \colon V \to W$ and $T \colon U \to V$ be such that $S(v_1 + \lambda v_2) = S(v_1) + \lambda S(v_2)$ and $T(u_1) + \mu T(u_2)$ for all $v_1, v_2 \in V$, $u_1, u_2 \in U$, and $\lambda, \mu \in \mathbb{R}$. Let $x_1, x_2 \in U$ and $\eta \in \mathbb{R}$. Then $S T(x_1 + \eta x_2) = S T(x_1) + \eta S T(x_2)$.

In this form, the distinction between hypothesis and conclusion is all the clearer. Parts of the hypothesis use the word “Let”, parts of the conclusion use the word “Then”.

With this formulation, the proof essentially writes itself. With all it’s gory details:

Proof

Let $S \colon V \to W$ and $T \colon U \to V$ be such that $S(v_1 + \lambda v_2) = S(v_1) + \lambda S(v_2)$ and $T(u_1) + \mu T(u_2)$ for all $v_1, v_2 \in V$, $u_1, u_2 \in U$, and $\lambda, \mu \in \mathbb{R}$. Let $x_1, x_2 \in U$ and $\eta \in \mathbb{R}$.[^quick] Then:

using the hypothesis on $T$ as $x_1, x_2 \in U$ and $\eta \in \mathbb{R}$. So:

using the hypothesis on $S$ as $T(x_1), T(x_2) \in V$ and $\eta \in \mathbb{R}$. Hence the conclusion is true.

Notes:
1. This could be condensed, but the important thing here is how to find it, not what the final form should be.
2. Notice that I wrote “as $x_1, x_2 \in U$” rather than “with $u_1 = x_1$ and $u_2 = x_2$“. This is partly style, and partly because in the statement of linearity, $u_1$ and $u_2$ are placeholders into which we put $x_1$ and $x_2$. So saying $u_1 = x_1$ is semantically incorrect as it equates a virtual vector with an actual vector. This is a very minor point, though.

Finally, I would like to disagree with one part of Qiaochu’s answer. I actually like the imagery of a steel trap. A proof is a bit like a trap: we want to capture the theorem in a trap so that it can’t wriggle out. We construct the proof so that there is no possibility of escape. Eventually, yes, we want the proof to be beautiful but when it’s first constructed we just want it to do the job. Only once the theorem is caught can we spend a little time decorating the cage to make it look pretty and set it off to its best advantage. So build the trap because theorems can be dangerous! An escaped theorem can do untold damage, rampaging across the countryside, laying waste like an unchecked viking.

(Okay, not quite finally. The step-by-step proof above was taking from a page I wrote for my students on the nature of proof. The original can be found here.)