Principal+Components+Analysis

Principal Components Analysis (PCA) (Multivariate Methods) (GWU EMSE-271)
Index | Topics (Logical Lectures) | Lectures | Problems | Readings | Nomenclature | Concepts

Help: Minitab | Only Correlation matrix

"Principal components analyis is a method that can be used to reduce the dimensionality of mulitvariate data." -Lattin

"Principal component analysis is a method for re-expressing multivariate data. It allows the research to reorient the data so that the first few dimensions account for as much of the available information as possible. The researcher must decide the number of dimensions to use, trading of simplicity for completeness." - EMSE 271, Fall 2009, Slide 235

"Principal component analysis is usually conducted by using the sample correlation matrix which can be obtained by calculation the sample covariance matrix of standardized data.

"Principal component analysis may also be conducted on the sample covariance matrix without standardization of the data. However, in this case the underlying variances of the variables may greatly affect that linear combination of the original data columns that achieves maximum variance.

"When using the sample covariance matrix for PCA using non-standardized data one should be able to motivate why the scale information of the original variables need to be retained." - EMSE 271, Fall 2009, Slide 263

[Note: Interesting set of steps on the {new? 12/15) Wikipedia [|Principal Component Analysis] page. (Check against current wiki entries later - TBD)]
 * Steps in a Principal Component Analysis ( When data is provided; if only correlation matrix first part changes. )**
 * **Step** || **Approach** || **Comment** ||
 * **Exploratory Data Analysis** and PCA ||  || VD did not talk about this, but seemed to do it in the sample problem. ||
 * - View Correlation Matrix || Excel Conditional Formatting || Thresholds will vary with problem. (Seeing if one component might be enough.) ||
 * - Perform check for highly correlated || Bartlett's Sphericity Test || Not Sufficient!! Actual test was found later in lectures, but it seem appropriate here to me. Only place I have found to calcuate it is in the Excel spreadsheet solutions (i.e .government.xls, druguse.xls. ||
 * - Balance with need for dimension reduction || Observation || Bartlett's is not enough (EMSE 271, Fall 2009, Slide 275) ||
 * - Scale Data ||  || If needed or it looks like the right thing to do. ||
 * **Principal Component Analysis** ||  || Can do PCA without raw data; only need correlation matrix (can be computed from covariance matrix (of standardized data; so underlying variances don't skew results. (Slide 275). If non-standardized, need to explain why. ||
 * - State Assumptions ||  ||   ||
 * - Run PCA Minitab || Minitab || Note: Minitab PCA function Stat | Multivariate Methods | Principal Components requires raw data. ||
 * - Or Excel PCA Correlation Matrix Analysis ||  ||   ||
 * - Significance || Explained Variance || Do you check a p-value? Or is explained variance enough? Is there an explained variance number? ||
 * **- Dimension Reduction - Retaining Components** ||  ||   ||
 * -- Review Loading Matrices || Conditional Formatting ||  ||
 * -- Scree Plot || Through Elbow ||  ||
 * -- Kaiser's Rule || Greater than 1 ||  ||
 * -- Horn's Procedure ||  ||   ||
 * -- Explained Variance || Some Threshold ||  ||
 * ** Assess Validity ** ||  || Don't remember being covered in Lecture for PCA. ||
 * - Jackknife Validation ||  ||   ||
 * - Bootstrap Validation ||  ||   ||
 * ** Interpretation ** ||  ||   ||
 * - Review Loading Matrices || Excel || Probably only 1st two by highlight using Excel conditional formatting. [Construct the loadings by multiplying the eigen vector time the square root of the eigen value.] [Are the eigenvalues the Z loadings? Appears so from government.xls solsution] ||
 * - Patterns of Association || Loading Plot || Mintab or Excel. Compares two components. ||
 * || Scatter Plots ||  ||


 * Sources:**
 * Analyzing Multivariate Data, by James Lattin, Douglas Carroll and Green, (c) 2003 ([|Amazon]), Chapter 4
 * EMSE 271, Fall 2009