I’m embarrassed to say that I have a PhD and hold an asst professorship, but get tripped up when reading statistics research. I am in a field of Business that is similar to IO Psychology or Social Psych. I spend too much time reading applied stats books, but I find even with all the reading I don’t have a firm grasp of what I’m actually doing. Everything is very ‘seat of the pants.’ (As sad as it seems, I think this is not a unique situation among the faculty in the social sciences…) The biggest problem comes when I need to apply a rarely used stat technique. I can find an article from a mathematical stats journal with the equations that would solve my problem, but I don’t have the math to convert those into code. I am forever relying on other prof’s R packages, and crossing my fingers hoping it will work (I can’t even check to verify if it did or not). It’s been over 15 years since I took Calculus and Algebra in undergrad, and I think I want to start at the beginning and truly understand probability and statistics.

I am starting with Gelfand’s Algebra and Trigonometry books for a quick refresher of the basics — I know it’s hard to believe, but in an applied research field we rarely have use for sin or cos. I’m even trying to finally learn how to correctly do a proof, using the books from Velleman (“How to Prove It“) and Houston (“How to Think Like a Mathematician“) — I’m serious about doing this right and understanding the subject. From there I want to move on to (correctly) learn the Calculus and Linear Algebra I need to tackle probability and statistics. I was thinking of using Strang’s Calculus and Algebra books. But Apostol’s Caculus comes highly recommended as well. After that I am completely at a loss. Further, I don’t know how far to go into Calculus or Linear Algebra before I reach diminishing returns. (Apostle introduces Probability in the second half of Vol. 2 — is it vital that I work through everything preceding it before tackling Probability?)

So my question is:if you had to do it over again with the goal of truly, deeply understanding statistics, where would you start? What books are the modern path to deep understanding? I would like to follow a modern path so that I can understand current research in statistics, including Bayesian approaches. But not in a machine learning context (which seems to be the all the rage at the moment), rather a social science / design and analysis of experiments / multilevel modeling context. Perhaps my goal would be the work ofAndrew Gelman; his and Hill’s book showed me how Ishouldbe looking at modeling and statistics (simulation, uncertainty estimates everywhere, bayesian inference, and so on). How should I go about relearning this material with that end goal in mind?

Update 1: Possible texts, starting from scratch with a focus on proofs and deep understanding. Not necessarily one after another.Relearn the basics:

- How to Prove It by Velleman
- How to Think Like a Mathematician by Houston
- Algebra and Trigonometry by Gelfand (for understanding why and how instead of what)
- Precalculus in a Nutshell by Simmons (for reference)
- Measurement by Lockhart (for inspiration)
Calculus (which one(s), and how deep?):

- Calculus by Strang
- Calculus vol. 1 and Calculus vol. 2 by Apostol
- Calculus by Spivak (solutions)
- Introduction to Calculus and Analysis: Volume I by Curant (and II/1 II/2?)
Linear Algebra (which one(s) and how deep?):

- Intro to Linear Algebra by Strang
- Matrix Algebra Useful for Statistics by Searle
- Matrix Algebra: Theory, Computations, and Applications in Statistics by Gentle
Probability (which one(s)?):

- An Introduction to Probability Theory and Its Applications, Vol. 1 and Vol. 2 by Feller (for intuitive understanding)
- Introduction to Probability Theory by Hoel, Port, Stone
- A Probability Path by Resnick (for measure theoretic / modern approach?)
- Fifty Challenging Problems in Probability by Mosteller
Core Statistics (which one(s)?):

- Probability and Statistics by DeGroot and Schervish
- Statistical Inference Casella and Berger
Other suggestions? Again with the goal of understanding and developing (or at least implementing) new methods in hierarchical modelling (generalized and linear).

**Answer**

As someone who started out their career thinking of statistics as a messy discipline, I’d like to share my epiphany regarding the matter. For me, the insight came from Linear Algebra, so I would urge you to push in that direction.

Specifically, once you realize that the sum of squares, ∑iX2i, and sum of products, ∑iXiYi, are both inner products (aka dot products), you realize that nearly all of statistics can be thought of as various operations from linear algebra.

If you sample n values from a population, you have an n-dimensional vector. The sample mean is a projection of this vector onto the n-dimensional all-ones vector. The standard deviation is projection onto the (n−1)-dimensional hyperplane normal to the all-ones vector (finally an intuitive reason for the “n−1” in the denominator!). Specifically, for the sample variance s2 for sample X, here is the linear algebra:

First, we work with deviations from the mean. The mean in linear algebra terms is

ˉX=⟨X,1⟩⟨1,1⟩1

where ⟨⋅,⋅⟩ is the inner product and 1 is the n-dimensional ones vector. Then the deviation from the mean is

x=X−ˉX

Note that x is constrained to an (n−1)-dimensional subspace. The usual equation for variance is

s2=∑i(Xi−ˉX)2n−1

For us, that’s

s2=⟨x,x⟩⟨1,1⟩

which, without going into too much detail (too late) is a normalized deviation. The trick there is that the new 1 has dimension n−1.

The other good example is that correlation between two samples is related to the angle between them in that n-dimensional space. To see this, consider that the angle between two vectors v and w is:

θ=arccos⟨v,w⟩‖

where \|\cdot\| is vector length. Compare this to one of the forms for the Pearson Correlation and you will see that r = \cos \theta.

There are many other examples, and these have barely been explained here, but I just hope to give an impression of how you can think in these terms.

**Attribution***Source : Link , Question Author : user27634 , Answer Author : Nigel*