Intuition behind Conditional Expectation

I’m struggling with the concept of conditional expectation. First of all, if you have a link to any explanation that goes beyond showing that it is a generalization of elementary intuitive concepts, please let me know.

Let me get more specific. Let $$(Ω,A,P)\left(\Omega,\mathcal{A},P\right)$$ be a probability space and $$XX$$ an integrable real random variable defined on $$(Ω,A,P)(\Omega,\mathcal{A},P)$$. Let $$F\mathcal{F}$$ be a sub-$$σ\sigma$$-algebra of $$A\mathcal{A}$$. Then $$E[X|F]E[X|\mathcal{F}]$$ is the a.s. unique random variable $$YY$$ such that $$YY$$ is $$F\mathcal{F}$$-measurable and for any $$A∈FA\in\mathcal{F}$$, $$E[X1A]=E[Y1A]E\left[X1_A\right]=E\left[Y1_A\right]$$.

The common interpretation seems to be: “$$E[X|F]E[X|\mathcal{F}]$$ is the expectation of $$XX$$ given the information of $$F\mathcal{F}$$.” I’m finding it hard to get any meaning from this sentence.

1. In elementary probability theory, expectation is a real number. So the sentence above makes me think of a real number instead of a random variable. This is reinforced by $$E[X|F]E[X|\mathcal{F}]$$ sometimes being called “conditional expected value”. Is there some canonical way of getting real numbers out of $$E[X|F]E[X|\mathcal{F}]$$ that can be interpreted as elementary expected values of something?

2. In what way does $$F\mathcal{F}$$ provide information? To know that some event occurred, is something I would call information, and I have a clear picture of conditional expectation in this case. To me $$F\mathcal{F}$$ is not a piece of information, but rather a “complete” set of pieces of information one could possibly acquire in some way.

Maybe you say there is no real intuition behind this, $$E[X|F]E[X|\mathcal{F}]$$ is just what the definition says it is. But then, how does one see that a martingale is a model of a fair game? Surely, there must be some intuition behind that!

I hope you have got some impression of my misconceptions and can rectify them.

Maybe this simple example will help. I use it when I teach
conditional expectation.

(1) The first step is to think of ${\mathbb E}(X)$ in a new way:
as the best estimate for the value of a random variable $X$ in the absence of any information.
To minimize the squared error

we differentiate to obtain $2e-2{\mathbb E}(X)$, which is zero at $e={\mathbb E}(X)$.

For example, if I throw a fair die and you have to
estimate its value $X$, according to the analysis above, your best bet is to guess ${\mathbb E}(X)=3.5$.
On specific rolls of the die, this will be an over-estimate or an under-estimate, but in the long run it minimizes the mean square error.

(2) What happens if you do have additional information?
Suppose that I tell you that $X$ is an even number.
How should you modify your estimate to take this new information into account?

The mental process may go something like this: “Hmmm, the possible values were $\lbrace 1,2,3,4,5,6\rbrace$
but we have eliminated $1,3$ and $5$, so the remaining possibilities are $\lbrace 2,4,6\rbrace$.
Since I have no other information, they should be considered equally likely and hence the revised expectation is $(2+4+6)/3=4$“.

Similarly, if I were to tell you that $X$ is odd, your revised (conditional) expectation is 3.

(3) Now imagine that I will roll the die and I will tell you the parity of $X$; that is, I will
tell you whether the die comes up odd or even. You should now see that a single numerical response
cannot cover both cases. You would respond “3” if I tell you “$X$ is odd”, while you would respond “4” if I tell you “$X$ is even”.
A single numerical response is not enough because the particular piece of information that I will give you is itself random.
In fact, your response is necessarily a function of this particular piece of information.
Mathematically, this is reflected in the requirement that ${\mathbb E}(X\ |\ {\cal F})$ must be $\cal F$ measurable.

I think this covers point 1 in your question, and tells you why a single real number is not sufficient.
Also concerning point 2, you are correct in saying that the role of $\cal F$ in ${\mathbb E}(X\ |\ {\cal F})$
is not a single piece of information, but rather tells what possible specific pieces of (random) information may occur.