Analysis+of+Residuals

Analysis of Residuals (Regression Analysis) (GWU EMSE-271)
Index | Topics (Logical Lectures) | Lectures | Problems | Readings | Nomenclature | Concepts

Residuals are examined to be sure models conform to underlying assumptions.

"If the model fit to the data were correct, the residuals would approximate the random errors that make the relationship between the explanatory variables and the response variable a statistical relationship. Therefore, if the residuals appear to behave randomly, it suggests that the model fits the data well. On the other hand, if non-random structure is evident in the residuals, it is a clear sign that the model fits the data poorly." - [|Wikipedia]

"For a univariate distribution, the distinction between errors and residuals is just the difference between deviations from the //population// mean versus the //sample// mean." - [|Wikipedia]

"The residuals from a fitted model" (regression) "are the differences between the responses observed at each combination values of the explanatory variables and the corresponding prediction of the response computed using the regression function." - [|Wikipedia]

"In regression analysis, the distinction between //errors// and //residuals// is subtle and important, and leads to the concept of" studentized residuals. - [|Wikipedia]

==> Importance in Regression <==
 * See Also:**
 * [|Graphical analysis of residuals] - Wikipedia
 * [|Quantitative analysis of residuals] - Wikipedia
 * Heteroscedasticity checked by observing residuals for non-randomness
 * Studentized Residuals in checking for outliers
 * Model Adequacy and (Statistical) Model Validation

"Given a function that relates the independent variable to the dependent variable – say, a line – the deviation of observations from this function are the errors. If one runs a regression on some data, then the deviations of observations from the //fitted// function are the residuals.

"However, because of the behavior of the process of regression, the //distributions// of residuals at different data points (of the input variable) may vary //even if// the errors themselves are identically distributed. Concretely, in a linear regression where the errors are identically distributed, the variability of residuals of inputs in the middle of the domain will be //higher// than the variability of residuals at the ends of the domain: linear regressions fit endpoints better than the middle. This is also reflected in the influence functions of various data points on the regression coefficients: endpoints have more influence.

"Thus to compare residuals at different inputs, one needs to adjust the residuals by the expected variability of //residuals,// which is called studentizing. This is particularly important in the case of detecting outliers : a large residual may be expected in the middle of the domain, but considered an outlier at the end of the domain." - [|Wikipedia]


 * Sources:**
 * Statistical model validation. (2009, April 14). In //Wikipedia, The Free Encyclopedia//. Retrieved 13:51, December 8, 2009, from []
 * Errors and residuals in statistics. (2009, September 24). In //Wikipedia, The Free Encyclopedia//. Retrieved 13:51, December 8, 2009, from []
 * EMSE 271, Fall 2009