Linear regression and data modeling problems are presented on this page along with detailed solutions. A linear regression calculator and grapher may also be used to verify answers and generate additional practice examples.
If a plot of \( n \) data pairs \( (x, y) \) suggests a linear relationship between \( x \) and \( y \), the least squares method can be used to determine the best-fitting straight line.
The least squares regression line minimizes the sum of the squares of the vertical distances \( d_1 + d_2 + \cdots + d_n \) between the observed data points and the line.
The equation of the least squares regression line is written in slope–intercept form:
\[ y = ax + b \]where the coefficients \( a \) and \( b \) are given by:
\[ a = \frac{ n \sum_{i=1}^{n} x_i y_i - \left(\sum_{i=1}^{n} x_i\right) \left(\sum_{i=1}^{n} y_i\right) }{ n \sum_{i=1}^{n} x_i^2 - \left(\sum_{i=1}^{n} x_i\right)^2 } \] \[ b = \frac{1}{n} \left( \sum_{i=1}^{n} y_i - a \sum_{i=1}^{n} x_i \right) \]Consider the set of points \[ \{(-2,-1),(1,1),(3,2)\}. \]
a) Find the least squares regression line.
b) Plot the data points and the regression line on the same set of axes.
Consider the data set \[ \{(-1,0),(0,2),(1,4),(2,5)\}. \]
a) Find the least squares regression line.
b) Plot the data points and the regression line.
The following table shows values of \( x \) and their corresponding values of \( y \).
| x | 0 | 1 | 2 | 3 | 4 |
|---|---|---|---|---|---|
| y | 2 | 3 | 5 | 4 | 6 |
a) Find the least squares regression line \( y = ax + b \).
b) Estimate the value of \( y \) when \( x = 10 \).
The sales of a company (in millions of dollars) for each year are shown below.
| Year | 2005 | 2006 | 2007 | 2008 | 2009 |
|---|---|---|---|---|---|
| Sales | 12 | 19 | 29 | 37 | 45 |
a) Find the least squares regression line.
b) Use the model to estimate the company’s sales in 2012.
| x | y | \(xy\) | \(x^2\) |
|---|---|---|---|
| -2 | -1 | 2 | 4 |
| 1 | 1 | 1 | 1 |
| 3 | 2 | 6 | 9 |
| \(\sum x=2\) | \(\sum y=2\) | \(\sum xy=9\) | \(\sum x^2=14\) |
Using the formulas:
\[ a=\frac{3(9)-2(2)}{3(14)-2^2}=\frac{23}{38}, \qquad b=\frac{1}{3}\left(2-\frac{23}{38}\cdot2\right)=\frac{5}{19} \]The regression line is:
\[ y=\frac{23}{38}x+\frac{5}{19} \]
| x | y | \(xy\) | \(x^2\) |
|---|---|---|---|
| -1 | 0 | 0 | 1 |
| 0 | 2 | 0 | 0 |
| 1 | 4 | 4 | 1 |
| 2 | 5 | 10 | 4 |
| \(\sum x=2\) | \(\sum y=11\) | \(\sum xy=14\) | \(\sum x^2=6\) |
Let \( t=x-2005 \) represent the number of years after 2005.
\[ a=\frac{5(368)-10(142)}{5(30)-10^2}=8.4, \qquad b=\frac{1}{5}(142-8.4\cdot10)=11.6 \]For 2012, \( t=7 \):
\[ y=8.4(7)+11.6=70.4 \]The estimated sales in 2012 are 70.4 million dollars.