Regression+Analysis

Regression Analysis (Regression | Multivariate Methods) (GWU EMSE-271)
Index | Topics (Logical Lectures) | Lectures | Problems | Readings | Nomenclature | Concepts

Assumptions | Statistical Model Validation | TBD

Factors of a regression analysis and the steps in a regression analysis can be inferred by reading Lattin, Chapter 3. Van Dorp's Session 10 may be a little more orderly in terms of a recommended sequence of steps.

Both cover almost the same factors (see table - TBD); however, in my review of the two, I found the prediction credibiliy interval in Van Dorp and validation issues overfitting, cross validation and Jacknife in Lattin. Van Dorp may have covered verbally or in the spreadsheet examples (not again reviewed yet).


 * Factors in Regression Analysis** (as compared to other mulitivariate methods) - TBR.


 * Steps in a Regression Analysis**

While certaintly deviation of normality of the residuals does not invalidate the least squares regression analysis, the assumption of normality is required to allow us to use the t -test, F -test that we have learned during the Statistical Review. EMSE 271, Fall 2009, Slide 209 || Checking for problems may not be up with Residual Analysis, or at least before model, perhaps because it is easier to check some things because of the way the data comes out and leave the specialty checks until later if the model is good enough to need more complete statistical validation.
 * **Step** || **Approach** || **Comment** ||
 * ** Data Exploration ** ||  ||   ||
 * - Descriptive Statistics || Beginning and multivariate method data exploration || Use frequencies and distributions, measures of the middle, measures of variation, measures of association [More ([|Wikipedia])} ||
 * - Check for normality of dependent variable || Histogram (perhaps transform independent variable(s)) || Try to normalize if necessary. Check alternate distributions using a Normal Probability Plot ||
 * - Examine data by looking at correlation matrix || Highlight higher values || Excel has a conditional formatting feature that works nicely ||
 * **Analyze** ||  ||   ||
 * - State Assumptions ||  ||   ||
 * - Run a regression || Regression calculations || Excel calculations or Data Analysis Pak, Minitab, etc. ||
 * - and do a preliminary Residual Analysis || Look at graphics and see of residuals are normal, not "bad" patterns || Minitab has a nice 4-5 graphic outputs. ||
 * -- Normality of Residuals || Minitab Probability Plot (with mean = 0 and standard error of residuals) Anderson-Darling || The larger the p-value the larger the support for the theoretical distribution. EMSE 271, Fall 2009, Slide 106
 * **Evaluate / Adapt Model** ||  || Choose what independent variables to use if not done so before. ||
 * - Determine Goodness-of-Fit || Coefficient of Determination (//R-squared//) ||  ||
 * - Check model significance || Statistical Tests || F-test for regression models ||
 * - Check significance of coefficients || Statistical Tests || t-test for regression models ||
 * ** Evaluate model ** ||  || ??? Adjusted R-Square, ??? Drop out coefficients that are not significant, ??? Circle Back ||
 * ** Check for problems such as ** ||  || Heteroscedasticity can be observed in the preliminary Residual Analysis. ||
 * - multicollinearity || Variance Inflation Factor | || VIF is a Minitab featured. Condition Index (Lattin) can be computed. ||
 * - heteroscedasticity || Observe graphic || Heteroscedasticity link as a picture of a common example. [More needed? TBR] ||
 * - autocorrelation (if time series data) || Durbin-Watson statistic ||  ||
 * - influential observations || Outliers | Studentized Residuals | DFITS || Notes: 1. Leverage points which are extreme in the x-direction and/or Outliers. 2. More residuals analysis. 3. Minitab does a DFITs calculation. Challenge: All only identify observations that should be checked. ||
 * **Model Validation** || Overfitting | Cross-Validation | Jackknife ||  ||
 * - Identify alternate models || Less or different parmeters || In Lesie-Salt example (Slide 222) identified an interaction effect, added "another" variable (rechecked assumption validitiy) and checked to see if change was significant. ||
 * - Compare models || F-test || Ratio of R-squareds divided by degrees of freedom factors (Slide 226) ||
 * **Forecast** (if intent) || yHat computed from regression equation, BUT || - at least in the Leslie-Salt example. Lattin page 70, EMSE 271, Fall 2000, Slides 227-233 ||
 * - Standard Error of Forecast (SEF) || Use SEF to account for model error and sampling error || Lattin page 69 and EMSE 271, Fall 2009, Slide 229 ||
 * - Prediction Interval ||  || Aside: May be interpreted as a credibility interval (Slide 230) ||
 * - Transform Forcast Values ||  || If the dependent variable was transformed before the analysis. Lattin, page 70, EMSE 271, Fall 2009, Slide 223 ["Retransfroming the data affects the distribution of the error term."] ||


 * Sources:**
 * Analyzing Multivariate Data, by James Lattin, Douglas Carroll and Green ([|Amazon])
 * EMSE 271, Fall 2009