I’ve read many times that the derivative of a function f(x) for a certain x is the best linear approximation of the function for values near x.

I always thought it was meant in a hand-waving approximate way, but I’ve recently read that:

“

Some people call the derivative the “best linear approximator” because of how accurate this approximation is for x near 0 (as seen in the picture below). In fact, the derivative.” (from http://davidlowryduda.com/?p=1520, where 0 is a special case in the context of Taylor Series).actually is the “best” in this sense – you can’t do betterThis seems to make it clear that the idea of “best linear approximation” is meant in a literal, mathematically rigorous way.

I’m confused because I believe that for a differentiable function, no matter how small you make the interval ϵ around x, there will always be for any a near x in that interval a line going through x that is either as good an approximation of f(a) as the one given by f′(x) (in case the function is actually linear over that interval), or a better approximation (the case in which the line going through (x,f(x)) also goes through (a, f(a)) and any line between this line and the tangent at x).

What am I missing?

**Answer**

As some people on this site might be aware I don’t always take downvotes well. So here’s my attempt to provide more context to my answer for whoever decided to downvote.

Note that I will confine my discussion to functions f:D⊆R→R and to ideas that should be simple enough for anyone who’s taken a course in scalar calculus to understand. Let me know if I haven’t succeeded in some way.

First, it’ll be convenient for us to define a new notation. It’s called “little oh” notation.

**Definition**: A function f is called little oh of g as x→a, denoted f∈o(g) as x→a, if

lim

Intuitively this means that f(x)\to 0 as x\to a “faster” than g does.

Here are some examples:

- x\in o(1) as x\to 0
- x^2 \in o(x) as x\to 0
- x\in o(x^2) as x\to \infty
- x-\sin(x)\in o(x) as x\to 0
- x-\sin(x)\in o(x^2) as x\to 0
- x-\sin(x)\not\in o(x^3) as x\to 0

Now what is an affine approximation? (Note: I prefer to call it affine rather than linear — if you’ve taken linear algebra then you’ll know why.) It is simply a function T(x) = A + Bx that *approximates* the function in question.

Intuitively it should be clear which affine function should best approximate the function f very near a. It should be L(x) = f(a) + f'(a)(x-a). Why? Well consider that any affine function really only carries two pieces of information: slope and some point on the line. The function L as I’ve defined it has the properties L(a)=f(a) and L'(a)=f'(a). Thus L is the unique line which passes through the point (a,f(a)) and has the slope f'(a).

But we can be a little more rigorous. Below I give a lemma and a theorem that tell us that L(x) = f(a) + f'(a)(x-a) is the **best affine approximation** of the function f at a.

**Lemma**: If a differentiable function f can be written, for all x in some neighborhood of a, as f(x) = A + B\cdot(x-a) + R(x-a) where A, B are constants and R\in o(x-a), then A=f(a) and B=f'(a).

**Proof**: First notice that because f, A, and B\cdot(x-a) are continuous at x=a, R must be too. Then setting x=a we immediately see that f(a)=A.

Then, rearranging the equation we get (for all x\ne a)

\frac{f(x)-f(a)}{x-a} = \frac{f(x)-A}{x-a} = \frac{B\cdot (x-a)+R(x-a)}{x-a} = B + \frac{R(x-a)}{x-a}

Then taking the limit as x\to a we see that B=f'(a). \ \ \ \square

**Theorem**: A function f is differentiable at a iff, for all x in some neighborhood of a, f(x) can be written as

f(x) = f(a) + B\cdot(x-a) + R(x-a) where B \in \Bbb R and R\in o(x-a).

**Proof**: “\implies“: If f is differentiable then f'(a) = \lim_{x\to a} \frac{f(x)-f(a)}{x-a} exists. This can alternatively be written f'(a) = \frac{f(x)-f(a)}{x-a} + r(x-a) where the “remainder function” r has the property \lim_{x \to a} r(x-a)=0. Rearranging this equation we get f(x) = f(a) + f'(a)(x-a) -r(x-a)(x-a). Let R(x-a):= -r(x-a)(x-a). Then clearly R\in o(x-a) (confirm this for yourself). So f(x) = f(a) + f'(a)(x-a) + R(x-a) as required.

“\impliedby“: Simple rearrangement of this equation yields

B + \frac{R(x-a)}{x-a}= \frac{f(x)-f(a)}{x-a}. The limit as x\to a of the LHS exists and thus the limit also exists for the RHS. This implies f is differentiable by the standard definition of differentiability. \ \ \ \square

Taken together the above lemma and theorem tell us that not only is L(x) = f(a) + f'(a)(x-a) the only affine function who’s remainder tends to 0 as x\to a **faster** than x-a itself (this is the sense in which this approximation is the *best*), but also that we can even **define the concept differentiability** by the existence of this best affine approximation.

**Attribution***Source : Link , Question Author : jeremy radcliff , Answer Author : mucciolo*