The dual of a norm ‖ is defined as:

\|z\|_* = \sup \{ z^Tx \text{ } | \text{ } \|x\| \le 1\}

Could anybody give me an intuition of this concept? I know the definition, I am using it to solve problems, but in reality I still lack intuitive understanding of it.

**Answer**

Here’s the way I like to think about it. I’ll start with the finite dimensional space \Bbb{R}^n because it looks like that’s where you are, but I’ll give an analogy for infinite dimensional spaces as well.

The quantity z^Tx represents a *linear functional* on \Bbb{R}^n, that is a linear function which eats a vector and spits out a real number:

f_z(x):\Bbb{R}^n\rightarrow\Bbb{R}\quad \text{such that }\quad f_z(\alpha x+\beta y)=\alpha f_z(x)+\beta f_z(y)\quad \forall \alpha,\beta\in\Bbb{R},x,y\in\Bbb{R}^n

Because of the Riesz Representation Theorem, we know that *any* linear function f:\Bbb{R}^n\rightarrow\Bbb{R} will take the form f=f_z for some z\in\Bbb{R}^n, i.e. f(x) = z^Tx.

The question is now this: given a linear function(al) f_z(\cdot), how “big” is it? Well, to measure the size of vectors, we look at norms, so the idea is simple: **how big is the number f_z(x)=z^Tx relative to the size (norm) of x?** This is exactly the number

\frac{z^Tx}{\|x\|}

We then say that the norm of z is the largest this quantity can possibly be:

\|z\|_* = \sup_{x\neq 0} \frac{z^Tx}{\|x\|}

In a way, this is a kind of “stretch factor”, but the stretching is measured with respect to \|x\|, which is the way we’re measuring the size of x. With a simple one-line proof, you can show that my way of defining \|z\|_* is the same as yours.

This idea extends to infinite dimensional normed spaces such as L^p as well – every normed space has a “dual” space of (continuous/bounded) linear functionals, i.e. mappings which eat vectors (which might actually be functions) and spit out numbers. Each of these functionals has an associated “size”, and that size is given by the dual norm:

\|f\|_* = \sup_{x\neq 0}\frac{f(x)}{\|x\|}

To really complete the picture – and to expand on a couple of comments – it helps to also think about the dual norm as a special case of an operator norm. The idea behind a general operator norm is pretty much the same as what I described above, but for a more general linear operator A:X\rightarrow Y where X and Y are *any* normed linear spaces. In the case of linear functionals, X is a vector space like \Bbb{R}^n or L^p etc, and Y is simply the ‘base field’, \Bbb{R} (or more generally \Bbb{C}). The idea is that A eats vectors and spits out other vectors, and to measure the “size” of A we might look again at the ratio of the size of Ax (measured with the Y norm) to the size of x (measured with the X norm):

\frac{\|Ax\|_Y}{\|x\|_X}

The largest of these values over nonzero x\in X is a good value for the size of A, because it tells us a sort of worst-case stretch factor:

\|A\|=\sup_{x\neq 0}\frac{\|Ax\|_Y}{\|x\|_X}

This is very similar to the idea of a singular value – in fact, if we use the Euclidean norm \|\cdot\|_2, the operator norm of a matrix *is* its largest singular value!

**Attribution***Source : Link , Question Author : trembik , Answer Author : icurays1*