# How was the normal distribution derived?

Abraham de Moivre, when he came up with this formula, had to assure that the points of inflection were exactly one standard deviation away from the center, and so that it was bell-shaped, as well as make sure that the area under the curve was exactly equal to one.

And somehow they came up with the standard normal distribution, which is as follows:

$$\displaystyle\phi(x) = \frac{1}{\sqrt{2\pi}}e^{-\dfrac{1}{2}x^2}$$

And even cooler, he found the distribution for when the mean was not $0$ and the standard deviation was not $1$, and came up with:

$$\displaystyle f(x) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\dfrac{(x – \mu)^2}{2\sigma^2}}$$

And so what I ask is, how? How was an equation come up with that fit all the aforementioned criteria? Moreover, how do the numbers $\pi$ and $e$ come into this?

Suppose I throw a dart into a dartboard. I aim at the centre of the board $(0,0)$
but I’m not all that good with darts so the dart lands in a random position $(X,Y)$ which has a joint density function $f:\mathbb R^2\to\mathbb R^+$.

Let’s make two assumptions about the way I play darts.

1.$\qquad$ The density is rotationally invariant so the distribution of where my dart lands only depends on the distance of the dart to the centre.

2.$\qquad$ The random variables $X$ and $Y$ are independent, how much I miss left and right makes no difference to the distribution of how much I miss up and down.

So by assumption one and Pythagoras I must be able to express the density
$$f(x,y) = g(x^2 + y^2).$$

Now as the random variables $X$ and $Y$ are independent and identically distributed I must be able to express
$$f(x,y) \propto f(x,0) f(0,y)$$
Combining these assumptions we get that for every pair $(x,y)$ we have
$$g(x^2 + y^2) \propto g(x^2)g(y^2).$$

This means that $g$ must be an exponential function
$$g(t) = A e^{-Bt}$$

So A will be some normalising constant. B somehow reflects the units I’m measuring in. (So if I measure the distance in cm B will be 10 times as big as if I measured in mm). $B$ must be negative because the density should be a decreasing function of distance (I’m not that bad at darts.)

So to work out $A$ I need to integrate $f(\cdot,\cdot)$ over $\mathbb R^2$
a quick change of coordinates and
$$\iint_{\mathbb R} f(x,y) dxdy = 2\pi\int_0^\infty t g(t) dt = \frac{2\pi}{B^2}.$$ for

So we should set $A = \frac{B^2}{2\pi}$ it’s convenient to choose $B$ in terms of the standard deviation, so we set $B = \frac 1{2\sigma}$ and $A = \frac{1}{2\pi\sigma^2}$.

So if I set $\tilde f(x) = \frac 1{\sqrt{2\pi}\sigma} e^{-\frac{x^2}{2\sigma}}$ then $f(x,y) = \tilde f(x) \tilde f(y)$.

So the $e$ comes from the fact I wanted my $X$ and $Y$ coordinates to be independent and the $\pi$ comes from the fact that I wanted rotational invariance so I’m integrating over a circle.

The interesting thing happens if I throw two darts.
Suppose I throw my first dart aiming at $(0,0)$ which lands at $(X_1,Y_1)$, I aim my next dart at the first dart, so this one lands at $(X_2,Y_2)$ with $X_2 = X_1 + X$ and $Y_2 = Y_1 + Y$.

So the position of the second dart is the sum of the two errors. But my sum is still rotationally invariant and the variables $X_2$ and $Y_2$ are still independent, so $(X_2,Y_2)$ satisfies my two assumptions.

That means that when I add independent normal distributions together I get another normal distribution.

It’s this property that makes it so useful, because if I take the average of a very long sequence of random variables I should get something that’s the same shape no matter how long my sequence is and taking a sequence twice as long is like adding the two sequences together. It’s this property of the normal distribution that makes it so useful.

PS a factor of two seems to be wrong in my derivation but I have to go to the airport now.