Why is gradient the direction of steepest ascent?

$$f(x_1,x_2,\dots, x_n):\mathbb{R}^n \to \mathbb{R}$$
The definition of the gradient is
$$ \frac{\partial f}{\partial x_1}\hat{e}_1 +\ \cdots +\frac{\partial f}{\partial x_n}\hat{e}_n$$

which is a vector.

Reading this definition makes me consider that each component of the gradient corresponds to the rate of change with respect to my objective function if I go along with the direction $\hat{e}_i$.

But I can’t see why this vector (defined by the definition of the gradient) has anything to do with the steepest descent.

Why do I get maximal value again if I move along with the direction of gradient?

Answer

Each component of the gradient tells you how fast your function is changing with respect to the standard basis. It’s not too far-fetched then to wonder, how fast the function might be changing with respect to some arbitrary direction? Letting $\vec v$ denote a unit vector, we can project along this direction in the natural way, namely via the dot product $\text{grad}( f(a))\cdot \vec v$. This is a fairly common definition of the directional derivative.

We can then ask in what direction is this quantity maximal? You’ll recall that $$\text{grad}( f(a))\cdot \vec v = |\text{grad}( f(a))|| \vec v|\text{cos}(\theta)$$

Since $\vec v$ is unit, we have $|\text{grad}( f)|\text{cos}(\theta)$, which is maximal when $\cos(\theta)=1$, in particular when $\vec v$ points in the same direction as $\text{grad}(f(a))$.

Attribution
Source : Link , Question Author : Jing , Answer Author : AsinglePANCAKE

Leave a Comment