The definition of the probability density function is presented starting from a histogram , then a probability histogram . Examples with solutions related to probability density are also included.
We use 2560 data values generated using Google sheets to present the idea of moving from histogram of frequencies to a histogram of probabilities . The data represents the time (in hours) taken to finish a project and hence is assumed to be continuous.
The Google sheet data file (1) including the data used here, may be downloaded and used to make the same histograms used in this page.
The time here is considered as a continuous random variable.
The histogram obtained using google sheets is shown below. The classes on the horizontal axis have a width equal to 1 and the frequencies are on the vertical axis.
The sum of all frequencies is equal to the total number of data values which is equal to 2560.
It is clear from the above probability histograms and the trendline, that if we have a large number of data values which is the case of a continuous random variable, as the class width becomes smaller, the trendline is a function that may be used to calculate probabilities using the area between the x-axis and the curve of this function.
This trendline function is called the probability density function.
If function \( f_{X}(x) \) is the probability density function of a random variable \( X \), then the probability that \( X \) is greater than or equal to \( a \) and smaller than or equal to \( b \) written as \( P( a \le X \le b) \)is given by the area between the x-axis, the curve and the vertical lines \( x = a \) and \( x = b \).
\[ \displaystyle P(a \le X \le b) = \text{Area between the curve, the x-axis, and x = a to x = b} \]
The (PDF) probability density functions \( f_{X}(x) \) of a continuous random variable \( X \) has the following properties:
1 - \( f_{X}(x) \ge 0 \)
2 - \( \displaystyle P(-\infty \lt X \lt -\infty) = 1 \), total area between curve of \( f_{X}(x) \) and x-axis is equal to 1.
Using integrals in calculus, we can write
\[ \displaystyle P(a \le X \le b) = \int_a^b f_{X}(x) \; dx \]
which is the area between the curve of the PDF, the x-axis, and \( x = a \) to \( x = b \).
\[ \displaystyle \int_{-\infty}^{\infty} f_{X}(x) \; dx = 1\]
which is the total area between the x-axis and the curve of the PDF
Example 1
A random variable \( X \) has a uniform probability density functions \( f_{X}(x) \) given by
\[ \begin{equation}
f_{X}(x) = \left\{
\begin{array}{l l}
\dfrac{k}{b-a} & \quad a \le x \le b\\
0 & \quad x \lt a \text{ or } x \gt b
\end{array} \right.
\end{equation}\]
where \( k \) is a positive constant.
a - Plot the graph of \( f_{X}(x) \).
b - Use the plot above and the properties of the probability density functions to find \(k\).
c - Use integrals in calculus to show that the total area between the x-axis and the curve is equal to \( 1 \).
d - Let \( a = 3 \) and \( b = 8 \), find the probability \( P( 4 \le X \le 7) \).
Solution
a-
The graph of the probability density function \( f_{X}(x) \) is shown below.
Example 2
A random variable \( X \) has a uniform probability density functions \( f_{X}(x) \) given by
\[ \begin{equation}
f_{X}(x) = \left\{
\begin{array}{l l}
e^{-k x} & \quad x \ge 0\\
0 & \quad x \lt 0
\end{array} \right.
\end{equation}\]
where \( k \) is a positive constant.
a - Find \( k \).
b - Find \( a \) so that \( P( 0 \le X \le a) = 0.9\) .
Solution
a-
The property of the probability density function that \( f_{X}(x) \ge 0 \) is satisfied.
For the second property of the probability density function to be satisfied, the total area between the curve of \( f_{X}(x) \) and the x-axis must be equal to \( 1 \).
The area \( A \) between the x-axis and the curve of of \( f_{X}(x) \) is given by the integral
\(\displaystyle \qquad A = \int_{-\infty}^{\infty} f_{X}(x) \; dx \)
Rewrite the above as
\( \displaystyle\qquad A = \int_{-\infty}^{0} f_{X}(x) \; dx + \int_{0}^{\infty} f_{X}(x) \; dx \)
\( f_{X}(x) = 0 \) in the interval \( (-\infty , 0 ) \), hence the above simplifies to
\(\displaystyle \qquad A = \int_{0}^{\infty} f_{X}(x) \; dx \)
The above integral is an improper one and may be written as
\( \displaystyle \qquad A = \lim_{\; b\to\infty} \int_{0}^{b} f_{X}(x) \; dx \)
Substitute \( f_{X}(x) \) by \( e^{-k x} \) and evaluate the integral
\( \displaystyle \qquad A = \lim_{\; b\to\infty} \int_{0}^{b} e^{-k x} \; dx \)
\( \displaystyle \qquad A = \lim_{\; b\to\infty} \left[ -\frac{1}{k}e^{-kx} \right]_0^b \)
\( \displaystyle \qquad A = \lim_{\; b\to\infty} \left[ -\frac{1}{k}e^{-k b} + \frac{1}{k} e^{0} \right] \)
Simplify
\( \displaystyle \qquad A = - \lim_{\; b\to\infty} \frac{1}{k}e^{-k b} + \frac{1}{k} \)
Since \( k \) is positive, \( \displaystyle \lim_{\; b\to\infty} \frac{1}{k}e^{-k b} = 0 \) and hence
\( \displaystyle \qquad A = \dfrac{1}{k} \)
The total area \( A = 1 \), hence the equation
\( \qquad \dfrac{1}{k} = 1 \)
Solve for \( k \) to obtain
\( \qquad k = 1 \)
b-
\( \displaystyle \qquad P( 0 \le X \le a) = \int_0^a f_{X}(x) \; dx \)
Since \( k = 1 \), \( f_{X} (x) = e^{- x} \) for \( x \ge 0 \), hence
\( \displaystyle \qquad P( 0 \le X \le a) = \int_0^a e^{-x} \; dx \)
Evaluate the above integral.
\( \displaystyle \qquad P( 0 \le X \le a) = \left[-e^{-x} \right]_0^a = -e^{-a} + 1 \)
Since \( P( 0 \le X \le a) = 0.9 \) ; we write the equation
\( \qquad -e^{-a} + 1 = 0.9 \)
Solve for \( a \)
\( \qquad e^{-a} = 0.1 \)
\( \qquad a = - \ln 0.1 \approx 2.3 \)