Intuition behind Conditional Expectation

I’m struggling with the concept of conditional expectation. First of all, if you have a link to any explanation that goes beyond showing that it is a generalization of elementary intuitive concepts, please let me know.

Let me get more specific. Let (Ω,A,P) be a probability space and X an integrable real random variable defined on (Ω,A,P). Let F be a sub-σ-algebra of A. Then E[X|F] is the a.s. unique random variable Y such that Y is F-measurable and for any AF, E[X1A]=E[Y1A].

The common interpretation seems to be: “E[X|F] is the expectation of X given the information of F.” I’m finding it hard to get any meaning from this sentence.

  1. In elementary probability theory, expectation is a real number. So the sentence above makes me think of a real number instead of a random variable. This is reinforced by E[X|F] sometimes being called “conditional expected value”. Is there some canonical way of getting real numbers out of E[X|F] that can be interpreted as elementary expected values of something?

  2. In what way does F provide information? To know that some event occurred, is something I would call information, and I have a clear picture of conditional expectation in this case. To me F is not a piece of information, but rather a “complete” set of pieces of information one could possibly acquire in some way.

Maybe you say there is no real intuition behind this, E[X|F] is just what the definition says it is. But then, how does one see that a martingale is a model of a fair game? Surely, there must be some intuition behind that!

I hope you have got some impression of my misconceptions and can rectify them.

Answer

Maybe this simple example will help. I use it when I teach
conditional expectation.

(1) The first step is to think of E(X) in a new way:
as the best estimate for the value of a random variable X in the absence of any information.
To minimize the squared error
E[(Xe)2]=E[X22eX+e2]=E(X2)2eE(X)+e2,
we differentiate to obtain 2e2E(X), which is zero at e=E(X).

For example, if I throw a fair die and you have to
estimate its value X, according to the analysis above, your best bet is to guess E(X)=3.5.
On specific rolls of the die, this will be an over-estimate or an under-estimate, but in the long run it minimizes the mean square error.

(2) What happens if you do have additional information?
Suppose that I tell you that X is an even number.
How should you modify your estimate to take this new information into account?

The mental process may go something like this: “Hmmm, the possible values were {1,2,3,4,5,6}
but we have eliminated 1,3 and 5, so the remaining possibilities are {2,4,6}.
Since I have no other information, they should be considered equally likely and hence the revised expectation is (2+4+6)/3=4“.

Similarly, if I were to tell you that X is odd, your revised (conditional) expectation is 3.

(3) Now imagine that I will roll the die and I will tell you the parity of X; that is, I will
tell you whether the die comes up odd or even. You should now see that a single numerical response
cannot cover both cases. You would respond “3” if I tell you “X is odd”, while you would respond “4” if I tell you “X is even”.
A single numerical response is not enough because the particular piece of information that I will give you is itself random.
In fact, your response is necessarily a function of this particular piece of information.
Mathematically, this is reflected in the requirement that E(X | F) must be F measurable.

I think this covers point 1 in your question, and tells you why a single real number is not sufficient.
Also concerning point 2, you are correct in saying that the role of F in E(X | F)
is not a single piece of information, but rather tells what possible specific pieces of (random) information may occur.

Attribution
Source : Link , Question Author : Stefan , Answer Author : Community

Leave a Comment