I’ve read the proof for why ∫∞0P(X>x)dx=E[X] for nonnegative random variables (located here) and understand its mechanics, but I’m having trouble understanding the intuition behind this formula or why it should be the case at all. Does anyone have any insight on this? I bet I’m missing something obvious.
For the discrete case, and if X is nonnegative, E[X]=∑∞x=0xP(X=x). That means we’re adding up P(X=0) zero times, P(X=1) once, P(X=2) twice, etc. This can be represented in array form, where we’re adding column-by-column:
We could also add up these numbers row-by-row, though, and get the same result. The first row has everything but P(X=0) and so sums to P(X>0). The second row has everything but P(X=0) and P(X=1) and so sums to P(X>1). In general, the sum of row x+1 is P(X>x), and so adding the numbers row-by-row gives us ∑∞x=0P(X>x), which thus must also be equal to ∑∞x=0xP(X=x)=E[X].
The continuous case is analogous.
In general, switching the order of summation (as in the proof the OP links to) can always be interpreted as adding row-by-row vs. column-by-column.