Linear Least Squares Fitting

What is Linear Least Squares Fitting?

Let (x1, y1), (x2, y2)... (xN, yN) be experimental data points as shown in the scatter plot below and suppose we want to predict the dependent variable y for different values of the independent variable x using a linear model of the form

y = a x + b


scatter plot of data points

Figure 1. scatter plot

A widely used procedure in mathematics is to minimize the sum D of the squares of the vertical distances d1, d2, ... between the mathematical model y = f(x) and the experimental points as shown in the graph below.

least square fitting a model to data

Figure 2. Least square fitting a model to data

Let \( f(x) = a x + b \) be the linear model to be used. Hence the vertical distances \(d_1, d_2, ... \) are given by

\( d_1 = |y_1 - (a x_1 + b))| \) , \( d_2 = |y_2 - (a x_2 + b))| \) , ...

The sum D of the squares of the vertical distances d1, d2, ... may be written as
\[ D = \sum_{i=1}^{N} (y_i - (a x_i + b))^2 \] The values of a and b that minimize D are the values that make the partial derivatives of D with respect to a and b simultaneously equal to 0. Hence we first calculate the two derivatives: \[ \dfrac {\partial D}{\partial a} = \sum_{i=1}^{N} - 2 x_i(y_i - a x_i - b) \] \[ \dfrac {\partial D}{\partial b } = \sum_{i=1}^{N} - 2 (y_i - a x_i - b) \] then solve for \( a \) and \( b \) the system of equations
\begin{cases} \sum_{i=1}^{N} - 2 x_i(y_i - a x_i - b) = 0 \\ \sum_{i=1}^{N} - 2 (y_i - a x_i - b) = 0 \end{cases} Divide both sides of each equation by - 2 and simplify to rewrite the system of equations as
\begin{cases} \sum_{i=1}^{N} x_i(y_i - a x_i - b) = 0 \\ \sum_{i=1}^{N} (y_i - a x_i - b) = 0 \end{cases} It is a system of equations with the two unknowns a and b. Expands the sums.
\begin{cases} \sum_{i=1}^{N} x_iy_i - a \sum_{i=1}^{N} x_i x_i - \sum_{i=1}^{N} x_i b = 0 \\ \sum_{i=1}^{N} y_i - a\sum_{i=1}^{N} x_i - \sum_{i=1}^{N} b = 0 \end{cases} We now rewrite the system of equations terms containing a and b on the left and all other terms on the right as follows
\begin{cases} a \sum_{i=1}^{N} x_i^2 + b \sum_{i=1}^{N} x_i = \sum_{i=1}^{N} x_iy_i \\ a\sum_{i=1}^{N} x_i + b N = \sum_{i=1}^{N} y_i \end{cases} The above system in matrix form is written as \[ \begin{bmatrix} \sum_{i=1}^{N} x_i^2 & \sum_{i=1}^{N} x_i \\ \sum_{i=1}^{N} x_i & N \end{bmatrix} \begin{bmatrix} a \\ b \end{bmatrix} = \begin{bmatrix} \sum_{i=1}^{N} x_iy_i \\ \sum_{i=1}^{N} y_i \end{bmatrix} \] The above system may be solved using the inverse matrix as follows \[ \begin{bmatrix} a \\ b \end{bmatrix} = \begin{bmatrix} \sum_{i=1}^{N} x_i^2 & \sum_{i=1}^{N} x_i \\ \sum_{i=1}^{N} x_i & N \end{bmatrix}^{-1} \begin{bmatrix} \sum_{i=1}^{N} x_iy_i \\ \sum_{i=1}^{N} y_i \end{bmatrix} \] where we use the inverse of a 2 by 2 matrix to find \[ \begin{bmatrix} \sum_{i=1}^{N} x_i^2 & \sum_{i=1}^{N} x_i \\ \sum_{i=1}^{N} x_i & N \end{bmatrix}^{ -1} = \dfrac{1}{N \sum_{i=1}^{N} x_i^2 - (\sum_{i=1}^{N} x_i)^2} \begin{bmatrix} N & - \sum_{i=1}^{N} x_i \\ - \sum_{i=1}^{N} x_i & \sum_{i=1}^{N} x_i^2 \end{bmatrix} \] Finally \[ \begin{bmatrix} a \\ b \end{bmatrix} = \dfrac{1}{N \sum_{i=1}^{N} x_i^2 - (\sum_{i=1}^{N} x_i)^2} \begin{bmatrix} N & - \sum_{i=1}^{N} x_i \\ - \sum_{i=1}^{N} x_i & \sum_{i=1}^{N} x_i^2 \end{bmatrix} \begin{bmatrix} \sum_{i=1}^{N} x_iy_i \\ \sum_{i=1}^{N} y_i \end{bmatrix} \] where \[ a = \dfrac{N \sum_{i=1}^{N} x_iy_i - (\sum_{i=1}^{N} x_i)(\sum_{i=1}^{N} y_i)} {N \sum_{i=1}^{N} x_i^2 - (\sum_{i=1}^{N} x_i)^2} \]
\[ b = \dfrac{ - (\sum_{i=1}^{N} x_i) (\sum_{i=1}^{N} x_i y_i) + (\sum_{i=1}^{N} x_i^2)( \sum_{i=1}^{N}y_i) } {N \sum_{i=1}^{N} x_i^2 - (\sum_{i=1}^{N} x_i)^2} \]

Example of Linear Least Squares Fitting Application

Find the linear least square fit y = a x + b for the experimental data points given by: {(1 , 2) , (3 , 4) , (2 , 6) , (4 , 8) , (5 , 12) , (6 , 13) , (7 , 15)}
Solution
Set up a table with the quantities included in the above formulas for m and b.
\(x_i\) \(y_i\) \(x_i\) \(y_i\) \(x_i^{2}\)
1 2 2 1
3 4 12 9
2 6 12 4
4 8 32 16
5 12 60 25
6 13 78 36
7 15 105 49

The total number of points is N = 7.

\( \sum_{i=1}^{7} x_i \) = 1 + 3 + 2 + 4 + 5 + 6 + 7 = 28

\( \sum_{i=1}^{7} y_i \) = 2 + 4 + 6 + 8 + 12 + 13 + 15 = 60

\( \sum_{i=1}^{7} x_iy_i \) = 2 + 12 + 12 + 32 + 60 + 78 + 105 = 301

\( \sum_{i=1}^{7} x_i^2 \) = 1 + 9 + 4 + 16 + 25 + 36 + 49 = 140

Substitute to obtain

\( a = \dfrac{7 \times 301 - 28 \times 60} {7 \times 140 - 28^2} = 2.17857142857 \)

\( b = \dfrac{ - 28 \times 301 + 140 \times 60 } {7 \times 140 - 28^2} = -0.14285714285 \)

Linear Least Squares Fitting Calculator

Given experimental points, this calculator calculates the coefficients a and b and hence the equation of the line y = a x + b and the correlation. It also plot the experimental points and the equation y = a x + b where a and b are given by the formulas above.
Enter the experimental points (x1, y1), (x2, y2)... (xN, yN) separated by commas, check the data entered and then press "Calculate and Plot". If you have data already formatted as points separated by commas, you may copy and paste it in the input text area below.
Enter Data Values: (x1, y1), (x2, y2)... (xN, yN) =
Decimal Places =


    
    
Hover the mousse cursor on the top right and you may use the option of downloading the graph in png form.

More References and Links