What is the difference between “probability density function” and “probability distribution function”?

Whats the difference between probability density function and probability distribution function?


Distribution Function

  1. The probability distribution function / probability function has ambiguous definition. They may be referred to:
    • Probability density function (PDF)
    • Cumulative distribution function (CDF)
    • or probability mass function (PMF) (statement from Wikipedia)
  2. But what confirm is:
    • Discrete case: Probability Mass Function (PMF)
    • Continuous case: Probability Density Function (PDF)
    • Both cases: Cumulative distribution function (CDF)
  3. Probability at certain x value, P(X=x) can be directly obtained in:
    • PMF for discrete case
    • PDF for continuous case
  4. Probability for values less than x, P(X<x) or Probability for values within a range from a to b, P(a<X<b) can be directly obtained in:
    • CDF for both discrete / continuous case
  5. Distribution function is referred to CDF or Cumulative Frequency Function (see this)

In terms of Acquisition and Plot Generation Method

  1. Collected data appear as discrete when:
    • The measurement of a subject is naturally discrete type, such as numbers resulted from dice rolled, count of people.
    • The measurement is digitized machine data, which has no intermediate values between quantized levels due to sampling process.
    • In later case, when resolution higher, the measurement is closer to analog/continuous signal/variable.
  2. Way of generate a PMF from discrete data:
    • Plot a histogram of the data for all the x's, the y-axis is the frequency or quantity at every x.
    • Scale the y-axis by dividing with total number of data collected (data size) and this is called PMF.
  3. Way of generate a PDF from discrete / continuous data:
    • Find a continuous equation that models the collected data, let say normal distribution equation.
    • Calculate the parameters required in the equation from the collected data. For example, parameters for normal distribution equation are mean and standard deviation. Calculate them from collected data.
    • Based on the parameters, plot the equation with continuous x-value that is called PDF.
  4. How to generate a CDF:
    • In discrete case, CDF accumulates the y values in PMF at each discrete x and less than x. Repeat this for every x. The final plot is a monotonically increasing until 1 in the last x this is called discrete CDF.
    • In continuous case, integrate PDF over x; the result is a continuous CDF.

Why PMF, PDF and CDF?

  1. PMF is preferred when
    • Probability at every x value is interest of study. This makes sense when studying a discrete data - such as we interest to probability of getting certain number from a dice roll.
  2. PDF is preferred when
    • We wish to model a collected data with a continuous function, by using few parameters such as mean to speculate the population distribution.
  3. CDF is preferred when
    • Cumulative probability in a range is point of interest.
    • Especially in the case of continuous data, CDF much makes sense than PDF - e.g., probability of students' height less than 170 cm (CDF) is much informative than the probability at exact 170 cm (PDF).

Source : Link , Question Author : Le Chifre , Answer Author : Rócherz

Leave a Comment