Whats the difference between probability density function and probability distribution function?
- The probability distribution function / probability function has ambiguous definition. They may be referred to:
- Probability density function (PDF)
- Cumulative distribution function (CDF)
- or probability mass function (PMF) (statement from Wikipedia)
- But what confirm is:
- Discrete case: Probability Mass Function (PMF)
- Continuous case: Probability Density Function (PDF)
- Both cases: Cumulative distribution function (CDF)
- Probability at certain x value, P(X=x) can be directly obtained in:
- PMF for discrete case
- PDF for continuous case
- Probability for values less than x, P(X<x) or Probability for values within a range from a to b, P(a<X<b) can be directly obtained in:
- CDF for both discrete / continuous case
- Distribution function is referred to CDF or Cumulative Frequency Function (see this)
In terms of Acquisition and Plot Generation Method
- Collected data appear as discrete when:
- The measurement of a subject is naturally discrete type, such as numbers resulted from dice rolled, count of people.
- The measurement is digitized machine data, which has no intermediate values between quantized levels due to sampling process.
- In later case, when resolution higher, the measurement is closer to analog/continuous signal/variable.
- Way of generate a PMF from discrete data:
- Plot a histogram of the data for all the x's, the y-axis is the frequency or quantity at every x.
- Scale the y-axis by dividing with total number of data collected (data size) ⟶ and this is called PMF.
- Way of generate a PDF from discrete / continuous data:
- Find a continuous equation that models the collected data, let say normal distribution equation.
- Calculate the parameters required in the equation from the collected data. For example, parameters for normal distribution equation are mean and standard deviation. Calculate them from collected data.
- Based on the parameters, plot the equation with continuous x-value ⟶ that is called PDF.
- How to generate a CDF:
- In discrete case, CDF accumulates the y values in PMF at each discrete x and less than x. Repeat this for every x. The final plot is a monotonically increasing until 1 in the last x ⟶ this is called discrete CDF.
- In continuous case, integrate PDF over x; the result is a continuous CDF.
Why PMF, PDF and CDF?
- PMF is preferred when
- Probability at every x value is interest of study. This makes sense when studying a discrete data - such as we interest to probability of getting certain number from a dice roll.
- PDF is preferred when
- We wish to model a collected data with a continuous function, by using few parameters such as mean to speculate the population distribution.
- CDF is preferred when
- Cumulative probability in a range is point of interest.
- Especially in the case of continuous data, CDF much makes sense than PDF - e.g., probability of students' height less than 170 cm (CDF) is much informative than the probability at exact 170 cm (PDF).