# How is the derivative truly, literally the “best linear approximation” near a point?

I’ve read many times that the derivative of a function $f(x)$ for a certain $x$ is the best linear approximation of the function for values near $x$.

I always thought it was meant in a hand-waving approximate way, but I’ve recently read that:

Some people call the derivative the “best linear approximator” because of how accurate this approximation is for $x$ near $0$ (as seen in the picture below). In fact, the derivative actually is the “best” in this sense – you can’t do better.” (from http://davidlowryduda.com/?p=1520, where $0$ is a special case in the context of Taylor Series).

This seems to make it clear that the idea of “best linear approximation” is meant in a literal, mathematically rigorous way.

I’m confused because I believe that for a differentiable function, no matter how small you make the interval $\epsilon$ around $x$, there will always be for any $a$ near $x$ in that interval a line going through $x$ that is either as good an approximation of $f(a)$ as the one given by $f'(x)$ (in case the function is actually linear over that interval), or a better approximation (the case in which the line going through $(x, f(x))$ also goes through (a, f(a)) and any line between this line and the tangent at $x$).

What am I missing?

As some people on this site might be aware I don’t always take downvotes well. So here’s my attempt to provide more context to my answer for whoever decided to downvote.

Note that I will confine my discussion to functions $f: D\subseteq \Bbb R \to \Bbb R$ and to ideas that should be simple enough for anyone who’s taken a course in scalar calculus to understand. Let me know if I haven’t succeeded in some way.

First, it’ll be convenient for us to define a new notation. It’s called “little oh” notation.

Definition: A function $f$ is called little oh of $g$ as $x\to a$, denoted $f\in o(g)$ as $x\to a$, if

Intuitively this means that $f(x)\to 0$ as $x\to a$ “faster” than $g$ does.

Here are some examples:

• $x\in o(1)$ as $x\to 0$
• $x^2 \in o(x)$ as $x\to 0$
• $x\in o(x^2)$ as $x\to \infty$
• $x-\sin(x)\in o(x)$ as $x\to 0$
• $x-\sin(x)\in o(x^2)$ as $x\to 0$
• $x-\sin(x)\not\in o(x^3)$ as $x\to 0$

Now what is an affine approximation? (Note: I prefer to call it affine rather than linear — if you’ve taken linear algebra then you’ll know why.) It is simply a function $T(x) = A + Bx$ that approximates the function in question.

Intuitively it should be clear which affine function should best approximate the function $f$ very near $a$. It should be Why? Well consider that any affine function really only carries two pieces of information: slope and some point on the line. The function $L$ as I’ve defined it has the properties $L(a)=f(a)$ and $L'(a)=f'(a)$. Thus $L$ is the unique line which passes through the point $(a,f(a))$ and has the slope $f'(a)$.

But we can be a little more rigorous. Below I give a lemma and a theorem that tell us that $L(x) = f(a) + f'(a)(x-a)$ is the best affine approximation of the function $f$ at $a$.

Lemma: If a differentiable function $f$ can be written, for all $x$ in some neighborhood of $a$, as where $A, B$ are constants and $R\in o(x-a)$, then $A=f(a)$ and $B=f'(a)$.

Proof: First notice that because $f$, $A$, and $B\cdot(x-a)$ are continuous at $x=a$, $R$ must be too. Then setting $x=a$ we immediately see that $f(a)=A$.

Then, rearranging the equation we get (for all $x\ne a$)

Then taking the limit as $x\to a$ we see that $B=f'(a)$. $\ \ \ \square$

Theorem: A function $f$ is differentiable at $a$ iff, for all $x$ in some neighborhood of $a$, $f(x)$ can be written as
where $B \in \Bbb R$ and $R\in o(x-a)$.

Proof: “$\implies$“: If $f$ is differentiable then $f'(a) = \lim_{x\to a} \frac{f(x)-f(a)}{x-a}$ exists. This can alternatively be written where the “remainder function” $r$ has the property $\lim_{x \to a} r(x-a)=0$. Rearranging this equation we get Let $R(x-a):= -r(x-a)(x-a)$. Then clearly $R\in o(x-a)$ (confirm this for yourself). So as required.

$\impliedby$“: Simple rearrangement of this equation yields

The limit as $x\to a$ of the LHS exists and thus the limit also exists for the RHS. This implies $f$ is differentiable by the standard definition of differentiability. $\ \ \ \square$

Taken together the above lemma and theorem tell us that not only is $L(x) = f(a) + f'(a)(x-a)$ the only affine function who’s remainder tends to $0$ as $x\to a$ faster than $x-a$ itself (this is the sense in which this approximation is the best), but also that we can even define the concept differentiability by the existence of this best affine approximation.