$$f(x_1,x_2,\dots, x_n):\mathbb{R}^n \to \mathbb{R}$$

The definition of the gradient is

$$ \frac{\partial f}{\partial x_1}\hat{e}_1 +\ \cdots +\frac{\partial f}{\partial x_n}\hat{e}_n$$which is a vector.

Reading this definition makes me consider that each component of the gradient corresponds to the rate of change with respect to my objective function if I go along with the direction $\hat{e}_i$.

But I can’t see why this vector (defined by the definition of the gradient) has anything to do with the steepest descent.

Why do I get maximal value again if I move along with the direction of gradient?

**Answer**

Each component of the gradient tells you how fast your function is changing with respect to the standard basis. It’s not too far-fetched then to wonder, how fast the function might be changing with respect to some arbitrary direction? Letting $\vec v$ denote a unit vector, we can project along this direction in the natural way, namely via the dot product $\text{grad}( f(a))\cdot \vec v$. This is a fairly common definition of the directional derivative.

We can then ask in what direction is this quantity maximal? You’ll recall that $$\text{grad}( f(a))\cdot \vec v = |\text{grad}( f(a))|| \vec v|\text{cos}(\theta)$$

Since $\vec v$ is unit, we have $|\text{grad}( f)|\text{cos}(\theta)$, which is maximal when $\cos(\theta)=1$, in particular when $\vec v$ points in the same direction as $\text{grad}(f(a))$.

**Attribution***Source : Link , Question Author : Jing , Answer Author : AsinglePANCAKE*