# What is the difference between “probability density function” and “probability distribution function”?

Whats the difference between probability density function and probability distribution function?

Distribution Function

1. The probability distribution function / probability function has ambiguous definition. They may be referred to:
• Probability density function (PDF)
• Cumulative distribution function (CDF)
• or probability mass function (PMF) (statement from Wikipedia)
2. But what confirm is:
• Discrete case: Probability Mass Function (PMF)
• Continuous case: Probability Density Function (PDF)
• Both cases: Cumulative distribution function (CDF)
3. Probability at certain $$xx$$ value, $$P(X=x)P(X = x)$$ can be directly obtained in:
• PMF for discrete case
• PDF for continuous case
4. Probability for values less than $$xx$$, $$P(X or Probability for values within a range from $$aa$$ to $$bb$$, $$P(a can be directly obtained in:
• CDF for both discrete / continuous case
5. Distribution function is referred to CDF or Cumulative Frequency Function (see this)

In terms of Acquisition and Plot Generation Method

1. Collected data appear as discrete when:
• The measurement of a subject is naturally discrete type, such as numbers resulted from dice rolled, count of people.
• The measurement is digitized machine data, which has no intermediate values between quantized levels due to sampling process.
• In later case, when resolution higher, the measurement is closer to analog/continuous signal/variable.
2. Way of generate a PMF from discrete data:
• Plot a histogram of the data for all the $$xx$$'s, the $$yy$$-axis is the frequency or quantity at every $$xx$$.
• Scale the $$yy$$-axis by dividing with total number of data collected (data size) $$⟶\longrightarrow$$ and this is called PMF.
3. Way of generate a PDF from discrete / continuous data:
• Find a continuous equation that models the collected data, let say normal distribution equation.
• Calculate the parameters required in the equation from the collected data. For example, parameters for normal distribution equation are mean and standard deviation. Calculate them from collected data.
• Based on the parameters, plot the equation with continuous $$xx$$-value $$⟶\longrightarrow$$ that is called PDF.
4. How to generate a CDF:
• In discrete case, CDF accumulates the $$yy$$ values in PMF at each discrete $$xx$$ and less than $$xx$$. Repeat this for every $$xx$$. The final plot is a monotonically increasing until $$11$$ in the last $$xx$$ $$⟶\longrightarrow$$ this is called discrete CDF.
• In continuous case, integrate PDF over $$xx$$; the result is a continuous CDF.

Why PMF, PDF and CDF?

1. PMF is preferred when
• Probability at every $$xx$$ value is interest of study. This makes sense when studying a discrete data - such as we interest to probability of getting certain number from a dice roll.
2. PDF is preferred when
• We wish to model a collected data with a continuous function, by using few parameters such as mean to speculate the population distribution.
3. CDF is preferred when
• Cumulative probability in a range is point of interest.
• Especially in the case of continuous data, CDF much makes sense than PDF - e.g., probability of students' height less than $$170170$$ cm (CDF) is much informative than the probability at exact $$170170$$ cm (PDF).