I’ve a confession to make. I’ve been using PDF’s and PMF’s without actually knowing what they are. My understanding is that density equals area under the curve, but if I look at it that way, then it doesn’t make sense to refer to the “mass” of a random variable in discrete distributions. How can I interpret this? Why do we call use “mass” and “density” to describe these functions rather than something else?

P.S. Please feel free to change the question itself in a more understandable way if you feel this is a logically wrong question.

**Answer**

(This answer takes as its starting point the OP’s question in the comments, “Let me understand mass before going to density. Why do we call a point in the discrete distribution as mass? Why can’t we just call it a point?”)

We could certainly call it a point. The utility of the term “probability mass function,” though, is that it tells us something about how the function in the discrete setting relates to the function in the continuous setting because of the associations we already have with “mass” and “density.” And I think to understand why we use these terms in the first place we have to start with what we call the density function. (In fact, I’m not sure we would even be using “probability mass” without the corresponding “probability density” function.)

Let’s say we have some function f(x) that we haven’t named yet but we know that ∫baf(x)dx yields the probability that we see an outcome between a and b. What should we call f(x)? Well, what are its properties? Let’s start with its units. We know that, in general, the units on a definite integral ∫baf(x)dx are the units of f(x) times the units of dx. In our setting, the integral gives a probability, and dx has units in say, length. So the units of f(x) must be probability per unit length. This means that f(x) must be telling us something about how much probability is concentrated per unit length near x; i.e., how *dense* the probability is near x. So it makes sense to call f(x) a “probability density function.” (In fact, one way to view ∫baf(x)dx is that, if f(x)≥0, f(x) is *always* a density function. From this point of view, height is area density, area is volume density, speed is distance density, etc. One of my colleagues uses an approach like this when he discusses applications of integration in second-semester calculus.)

Now that we’ve named f(x) a density function, what should we call the corresponding function in the discrete setting? It’s not a density function; its units are probability rather than probability per unit length. So what is it? Well, when we say “density” without a qualifier we are normally talking about “mass density,” and when we integrate a density function over an object we obtain the mass of that object. With this in mind, the relationship between the probability function in the continuous setting to that of the probability function in the discrete setting is exactly that of density to mass. So “probability mass function” is a natural term to grab to apply to the corresponding discrete function.

**Attribution***Source : Link , Question Author : 0x0 , Answer Author : Mike Spivey*