I’m reading on Hoeffding’s covariance identity, the proof of which is neatly covered here, or, in a similar manner, in this MSE post, but I can’t seem to fully understand the trick/property used there.

I.e., assume (X1,Y1) and (X2,Y2) are two independent vectors with identical distribution. The key point in the proof is to note that we can write

E[(X1−X2)(Y1−Y2)] as

E(∬

Why does this hold?

**Answer**

What underlies the equality \mathbb E(X) = \mathbb E(\int \mathbb 1_{u\le X}\,du) is, intuitively, the way one thinks of the Lebesgue integral as coming from partitioning the y-axis, whereas the Riemann integral comes from partitioning the x-axis.

Think of a reasonable function f(x) (say continuous, but that’s not necessary, and nonnegative to be concrete). We think of \int_{-\infty}^\infty f(x)\,dx as the area under the curve y=f(x).

Now write this as an iterated integral and then change the order of integration:

\int_{-\infty}^\infty f(x)\,dx = \int_{-\infty}^\infty\int_0^{f(x)} 1\,dy\,dx =

\int_0^\infty \mu(\{x: f(x)\ge y\}\,dy.

The x cross-section at height y is precisely the set of points x where f(x)\ge y. Here \mu(E) is the (Lebesgue) measure of E\subset\Bbb R.

**Attribution***Source : Link , Question Author : runr , Answer Author : Ted Shifrin*