Problems

Consider the following three data sets A, B and C.
A = {9,10,11,7,13}
B = {10,10,10,10,10}
C = {1,1,10,19,19}
a) Calculate the mean of each data set.
b) Calculate the standard deviation of each data set.
c) Which set has the largest standard deviation?
d) Is it possible to answer question c) without calculations of the standard deviation?

A given data set has a mean μ and a standard deviation σ.
a) What are the new values of the mean and the standard deviation if the same constant k is added to each data value in the given set?Explain.
b) What are the new values of the mean and the standard deviation if each data value of the set is multiplied by the same constant k?Explain.

If the standard deviation of a given data set is equal to zero, what can we say about the data values included in the given data set?

The frequency table of the monthly salaries of 20 people is shown below.
salary(in $) 
frequency 
3500 
5 
4000 
8 
4200 
5 
4300 
2 
a) Calculate the mean of the salaries of the 20 people.
b) Calculate the standard deviation of the salaries of the 20 people.

The following table shows the grouped data, in classes, for the heights of 50 people.
height (in cm)  classes 
frequency 
120 < 130 
2 
130 < 140 
5 
140 < 150 
25 
150 < 160 
10 
160 < 170 
8 
a) Calculate the mean of the salaries of the 20 people.
b) Calculate the standard deviation of the salaries of the 20 people.


mean of Data set A = (9+10+11+7+13)/5 = 10
mean of Data set B = (10+10+10+10+10)/5 = 10
mean of Data set C = (1+1+10+19+19)/5 = 10

Standard Deviation Data set A
= √[ ( (910)^{2}+(1010)^{2}+(1110)^{2}+(710)^{2}+(1310)^{2} )/5 ] = 2
Standard Deviation Data set B
= √[ ( (1010)^{2}+(1010)^{2}+(1010)^{2}+(1010)^{2}+(1010)^{2} )/5 ] = 0
Standard Deviation Data set C
= √[ ( (110)^{2}+(110)^{2}+(1010)^{2}+(1910)^{2}+(1910)^{2} )/5 ] = 8.05

Data set C has the largest standard deviation.

Yes, since data Set C has data values that are further away from the mean compared to sets A and B.


We limit the discussion to a data set with 3 values for simplicity, but the conclusions are true for any data set with quantitative data.
Let x, y and z be the data values making a data set.
The mean μ = (x + y + z) / 3
The standard deviation σ = √[ ((x  μ)^{2} + (y  μ)^{2} + (z  μ)^{2})/3 ]
We now add a constant k to each data value and calculate the new mean μ'.
μ' = ((x + k) + (y + k) + (z + k)) / 3 = (x + y + z) / 3 + 3k/3 = μ + k
We now calculate the new mean standard deviation σ'.
σ' = √[ ((x + k  μ')^{2} +(y + k  μ')^{2}+(z + k  μ')^{2})/3 ]
Note that x + k  μ' = x + k  μ  k = x  μ
also y + k  μ' = y + k  μ  k = y  μ and z + k  μ' = z + k  μ  k = z  μ
Therefore σ' = √[ ((x  μ)^{2} +(y  μ)^{2}+(z  μ)^{2})/3 ] = σ
If we add the same constant k to all data values included in a data set, we obtain a new data set whose mean is the mean of the original data set PLUS k. The standard deviation does not change.

We now multiply all data values by a constant k and calculate the new mean μ' and the new standard deviation σ'.
μ' = (kx + ky + kz) / 3 = kμ
σ' = √[ ((kx  kμ)^{2} +(ky  kμ)^{2}+(kz  kμ)^{2})/3 ] = k σ
If we multiply all data values included in a data set by a constant k, we obtain a new data set whose mean is the mean of the original data set TIMES k and standard deviation is the standard deviation of the original data set TIMES the absolute value of k.


Again, we limit the discussion to a data set with 4 values for simplicity, but the conclusions are true for any data set with quantitative data.
Let x, y, z and w be the data values making a data set with mean μ.
The standard deviation σ = √[ ((x  μ)^{2} + (y  μ)^{2} + (z  μ)^{2} + (w  μ)^{2})/3 ]
Let σ = 0, hence
√[ ((x  μ)^{2} + (y  μ)^{2} + (z  μ)^{2} + (w  μ)^{2})/3 ] = 0
Which gives
(x  μ)^{2} + (y  μ)^{2} + (z  μ)^{2} + (w  μ)^{2} = 0
All terms in the equation are positive and therefore, the above equation is equivalent to
(x  μ)^{2} = 0, (y  μ)^{2} = 0, (z  μ)^{2} = 0 and (w  μ)^{2} = 0.
Which gives
x = y = z = w = μ : all data values in the set with σ = 0 are equal.


Let x_{i} be the i th salary and f_{i} be the corresponding frequency.
mean of grouped data = μ = (Σx_{i}*f_{i}) / Σf_{i}
= (3500*5 + 4000*8 + 4200*5 + 4300*2) /(5 + 8 + 5 + 2)
= $3955
b) standard deviation of grouped data = √[ (Σ(x_{i}μ)^{2}*f_{i}) / Σf_{i} ]
= √[ (5*(35003955)^{2}+8*(40003955)^{2}+5*(42003955)^{2}+2*(43003955)^{2}) /(20) ]
= 282 (rounded to the nearest unit)


We first find the midpoints of the given classes.
height (in cm)  classes 
midpoint 
frequency 
120 < 130 
125 
2 
130 < 140 
135 
5 
140 < 150 
145 
25 
150 < 160 
155 
10 
160 < 170 
165 
8 
Let m_{i} be the midpoint of the i th clss and f_{i} be the corresponding frequency.
mean of grouped data = μ = (Σm_{i}*f_{i}) / Σf_{i}
= (125*2 + 135*5 + 145*25 + 155*10 + 165*8) /(2+5+25+10+8)
= 148.4
b) standard deviation of grouped data = √[ (Σ(m_{i}μ)^{2}*f_{i}) / Σf_{i} ]
= √[ (2*(125148.4)^{2}+5*(135148.4)^{2}+25*(145148.4)^{2}+10*(155148.4)^{2}+8*(165148.4)^{2}) /(50) ]
= 9.9
More References and links
elementary statistics and probabilities.
Home Page
