What is the difference between “probability density function” and “probability distribution function”?

Whats the difference between probability density function and probability distribution function?

Answer

Distribution Function

  1. The probability distribution function / probability function has ambiguous definition. They may be referred to:
    • Probability density function (PDF)
    • Cumulative distribution function (CDF)
    • or probability mass function (PMF) (statement from Wikipedia)
  2. But what confirm is:
    • Discrete case: Probability Mass Function (PMF)
    • Continuous case: Probability Density Function (PDF)
    • Both cases: Cumulative distribution function (CDF)
  3. Probability at certain x value, P(X=x) can be directly obtained in:
    • PMF for discrete case
    • PDF for continuous case
  4. Probability for values less than x, P(X<x) or Probability for values within a range from a to b, P(a<X<b) can be directly obtained in:
    • CDF for both discrete / continuous case
  5. Distribution function is referred to CDF or Cumulative Frequency Function (see this)

In terms of Acquisition and Plot Generation Method

  1. Collected data appear as discrete when:
    • The measurement of a subject is naturally discrete type, such as numbers resulted from dice rolled, count of people.
    • The measurement is digitized machine data, which has no intermediate values between quantized levels due to sampling process.
    • In later case, when resolution higher, the measurement is closer to analog/continuous signal/variable.
  2. Way of generate a PMF from discrete data:
    • Plot a histogram of the data for all the x's, the y-axis is the frequency or quantity at every x.
    • Scale the y-axis by dividing with total number of data collected (data size) and this is called PMF.
  3. Way of generate a PDF from discrete / continuous data:
    • Find a continuous equation that models the collected data, let say normal distribution equation.
    • Calculate the parameters required in the equation from the collected data. For example, parameters for normal distribution equation are mean and standard deviation. Calculate them from collected data.
    • Based on the parameters, plot the equation with continuous x-value that is called PDF.
  4. How to generate a CDF:
    • In discrete case, CDF accumulates the y values in PMF at each discrete x and less than x. Repeat this for every x. The final plot is a monotonically increasing until 1 in the last x this is called discrete CDF.
    • In continuous case, integrate PDF over x; the result is a continuous CDF.

Why PMF, PDF and CDF?

  1. PMF is preferred when
    • Probability at every x value is interest of study. This makes sense when studying a discrete data - such as we interest to probability of getting certain number from a dice roll.
  2. PDF is preferred when
    • We wish to model a collected data with a continuous function, by using few parameters such as mean to speculate the population distribution.
  3. CDF is preferred when
    • Cumulative probability in a range is point of interest.
    • Especially in the case of continuous data, CDF much makes sense than PDF - e.g., probability of students' height less than 170 cm (CDF) is much informative than the probability at exact 170 cm (PDF).

Attribution
Source : Link , Question Author : Le Chifre , Answer Author : Rócherz

Leave a Comment