Chapter 11: Regression Analysis: Simple Linear Regression
Residuals and Total Squared Error
A regression line is the best-fitting straight line through a set of data points. Regression analysis is all about predicting values, and what makes a regression line 'best-fitting' is that it has the lowest possible amount of prediction error.
In the context of regression, the amount of prediction error is expressed in terms of residuals.
Residual
A residual is the vertical distance between the regression line and a data point and is denoted by .
Calculating Residuals
To calculate a residual, take a point from the data and determine the height of the regression line at point . This point is the predicted value of and is denoted by .
Next, subtract the predicted value from the observed value to determine the value of the residual:
Calculation of Residuals
Consider the regression equation and the data points , , and . The residuals of these three data points are calculated as follows:
- For the first point :
- .
- For the second point :
- .
- For the last point :
- .
One of the most commonly used measures to summarize the total amount of prediction error is the Total Squared Error.
Total Squared Error
The Total Squared Error is the sum of the squared residuals and is often abbreviated TSE.
The reason for squaring the residuals before adding them together is to prevent positive and negative residuals from canceling one another. Consequently, the total squared error will always be a positive number.
Calculation of Total Squared Error
Consider the regression line and residuals from the previous example. In this case, the Total Squared Error is:
Or visit omptest.org if jou are taking an OMPT exam.