How is the derivative truly, literally the “best linear approximation” near a point?

I’ve read many times that the derivative of a function f(x) for a certain x is the best linear approximation of the function for values near x.

I always thought it was meant in a hand-waving approximate way, but I’ve recently read that:

Some people call the derivative the “best linear approximator” because of how accurate this approximation is for x near 0 (as seen in the picture below). In fact, the derivative actually is the “best” in this sense – you can’t do better.” (from, where 0 is a special case in the context of Taylor Series).

This seems to make it clear that the idea of “best linear approximation” is meant in a literal, mathematically rigorous way.

I’m confused because I believe that for a differentiable function, no matter how small you make the interval ϵ around x, there will always be for any a near x in that interval a line going through x that is either as good an approximation of f(a) as the one given by f(x) (in case the function is actually linear over that interval), or a better approximation (the case in which the line going through (x,f(x)) also goes through (a, f(a)) and any line between this line and the tangent at x).

What am I missing?


As some people on this site might be aware I don’t always take downvotes well. So here’s my attempt to provide more context to my answer for whoever decided to downvote.

Note that I will confine my discussion to functions f:DRR and to ideas that should be simple enough for anyone who’s taken a course in scalar calculus to understand. Let me know if I haven’t succeeded in some way.

First, it’ll be convenient for us to define a new notation. It’s called “little oh” notation.

Definition: A function f is called little oh of g as xa, denoted fo(g) as xa, if


Intuitively this means that f(x)\to 0 as x\to a “faster” than g does.

Here are some examples:

  • x\in o(1) as x\to 0
  • x^2 \in o(x) as x\to 0
  • x\in o(x^2) as x\to \infty
  • x-\sin(x)\in o(x) as x\to 0
  • x-\sin(x)\in o(x^2) as x\to 0
  • x-\sin(x)\not\in o(x^3) as x\to 0

Now what is an affine approximation? (Note: I prefer to call it affine rather than linear — if you’ve taken linear algebra then you’ll know why.) It is simply a function T(x) = A + Bx that approximates the function in question.

Intuitively it should be clear which affine function should best approximate the function f very near a. It should be L(x) = f(a) + f'(a)(x-a). Why? Well consider that any affine function really only carries two pieces of information: slope and some point on the line. The function L as I’ve defined it has the properties L(a)=f(a) and L'(a)=f'(a). Thus L is the unique line which passes through the point (a,f(a)) and has the slope f'(a).

But we can be a little more rigorous. Below I give a lemma and a theorem that tell us that L(x) = f(a) + f'(a)(x-a) is the best affine approximation of the function f at a.

Lemma: If a differentiable function f can be written, for all x in some neighborhood of a, as f(x) = A + B\cdot(x-a) + R(x-a) where A, B are constants and R\in o(x-a), then A=f(a) and B=f'(a).

Proof: First notice that because f, A, and B\cdot(x-a) are continuous at x=a, R must be too. Then setting x=a we immediately see that f(a)=A.

Then, rearranging the equation we get (for all x\ne a)

\frac{f(x)-f(a)}{x-a} = \frac{f(x)-A}{x-a} = \frac{B\cdot (x-a)+R(x-a)}{x-a} = B + \frac{R(x-a)}{x-a}

Then taking the limit as x\to a we see that B=f'(a). \ \ \ \square

Theorem: A function f is differentiable at a iff, for all x in some neighborhood of a, f(x) can be written as
f(x) = f(a) + B\cdot(x-a) + R(x-a) where B \in \Bbb R and R\in o(x-a).

Proof: “\implies“: If f is differentiable then f'(a) = \lim_{x\to a} \frac{f(x)-f(a)}{x-a} exists. This can alternatively be written f'(a) = \frac{f(x)-f(a)}{x-a} + r(x-a) where the “remainder function” r has the property \lim_{x \to a} r(x-a)=0. Rearranging this equation we get f(x) = f(a) + f'(a)(x-a) -r(x-a)(x-a). Let R(x-a):= -r(x-a)(x-a). Then clearly R\in o(x-a) (confirm this for yourself). So f(x) = f(a) + f'(a)(x-a) + R(x-a) as required.

\impliedby“: Simple rearrangement of this equation yields

B + \frac{R(x-a)}{x-a}= \frac{f(x)-f(a)}{x-a}. The limit as x\to a of the LHS exists and thus the limit also exists for the RHS. This implies f is differentiable by the standard definition of differentiability. \ \ \ \square

Taken together the above lemma and theorem tell us that not only is L(x) = f(a) + f'(a)(x-a) the only affine function who’s remainder tends to 0 as x\to a faster than x-a itself (this is the sense in which this approximation is the best), but also that we can even define the concept differentiability by the existence of this best affine approximation.

Source : Link , Question Author : jeremy radcliff , Answer Author : mucciolo

Leave a Comment