Principal Component Analysis

Use the Geochem Analysis > Principal Component Analysis option (CHPCOMP GX) to perform a principal component analysis.

Principal Component Analysis dialog options

Channels to include

"All ASSAY channels" – all channels that are members of the ASSAY class will be reported

"Displayed ASSAY channels" – only displayed channels that are member of the ASSAY class will be reported.

"Select ASSAY channels from list" -- select ASSAY class channels from a two-panel selection list.

Script Parameter: CHPCOMP.ASSAYS: One of "ASSAY" "DISPLAYED_ASSAY or "LIST"

Maximum # of components for display

For creation of channels and output of results to "princomp.log", the maximum number of principal components to display. By default, up to 10 components are displayed. If left blank, all will be displayed.

Script Parameter: CHPCOMP.NMAX

Eigenvalue cutoff for varimax

All components with eigenvalues less than this value are rejected, and the principal component loadings are re-computed using Kaiser’s varimax scheme. By default, values less than 1.0 are rejected.

Script Parameter: CHPCOMP.CUTOFF

Varimax transformation?

Select yes to perform a varimax transformation/rotation on the principal component data.

Script Parameter: CHPCOMP.V_TRANSFORMATION

Save scores as channels?

If yes, save the "score" values as channels. One channel is created for each principal component, up to the maximum specified above. Score channels SC1, SC2 etc. are created for the initial calculation, and VSC1, VSC2, etc. for the varimax analysis.

Script Parameter: CHPCOMP.SCORES: 0:"No", 1: "Yes" (Default is "Yes")

Normalize scores?

If yes, transform the score values so that they lie between 0 and 100. If the range of scores values is A to B, then a value X is transformed using the following formula:

X’ = (X-A)*100/(B-A)

Lines/groups to include

"D" for the displayed line only. "S" for all selected lines. "A" for all lines.

Statistics will be calculated on the collected data from all the lines chosen.

Script Parameter: CHPCOMP.LINE

Principal Component Analysis

Principal component analysis is a collection of mathematical methods which are designed to reveal mathematical relationships between two or more (often many) variables. Measurements which include many variables are commonly encountered in mineral exploration and geochemistry. For instance, the concentrations of a suite of minerals or elements may be determined for a number of rock samples. In this case the "variables" are the concentrations of each constituent. Consider a collection of samples containing three elements of interest. The relative abundance of the three elements for each sample could be plotted as a unique position in a 3-dimensional plot, with the axes representing each of the elements. Were the points be concentrated about some plane or line, it would be because some inter-relationship or dependence existed between the variables. Principal component analysis determines the significance of these correlations. For instance, the first principle component is the "best fit" line through the data. Were the data to be originally concentrated along a line, the first principle component would contain most of the "information" about the correlations of the data. The second principle component is determined by fitting the best fit line through the data, after the first principle component’s contribution has been removed, and so on. For N variables, a total of N principal components can be extracted, and the data can be completely re-constructed from the information contained within the components.

Data Standardisation

Prior to calculation of the principal components, the data is transformed into a condition amenable to analysis. In the CHPCOMP GX, depending on whether the assay channel’s "Logarithmic Distribution" attribute is set to "Yes", the logarithms of the data are taken. The mean is then removed, and finally the data are normalised through division by the variance (standard deviation).

Eigenvalues

A correlation matrix is produced from the transformed data. An eigenvector decomposition is performed to determine the eigenvectors (which are directionally equivalent to the principal components) and eigenvalues. The relative significance of each component is indicated by its "eigenvalue". The first principal component will have the largest eigenvalue, and succeeding components will have smaller eigenvalues, as their significance in the data decreases. In our analysis, the sum of the eigenvalues is equal to the number of variables, N, which means that the average eigenvalue is one. Typically, it is those components with eigenvalues exceeding one which are of interest to the analyst. By selecting a limited number of components and re-synthesising the data from the associated components, the more important interrelationships between the different variables can be emphasised, and analysed. Throwing away the contributions of the lesser components can be viewed as eliminating the "noise" from data. (Though in fact it may not be "noise" in the traditional sense, but a combination of natural variability, measurement error and "true" correlation factors which are small enough to be ignored).

Principal Component Loading and Scores

The principal component loading are the eigenvectors of the correlation matrix, ordered in terms of the size of the eigenvalues, and scaled by the square roots of the eigenvalues. They are the "loadings" of the variables on the principal component axes.

For a given variable, the vector sum of the loadings over all the components has a length of one. Generally, the largest loadings occur for the largest components, and the loadings for the last components are generally very small.

The "scores" describe the contribution of each principle component to each data point. The standardised data can be reconstituted from a matrix multiplication of the scores times the (transpose of the) loadings. Unlike the loadings, these values are not automatically normalised, and it can be useful, for display purposes, to re-scale them from 0 to 100 in order to more clearly show which components are most influential.

Varimax Normalization

The directions of the principle components are constrained by the fact that they are mutually orthogonal. Once the "dimensionality" of the data has been reduced by rejecting the contributions of a number of the lesser principal components, it is often possible to rotate the remaining axes to obtain a better "fit". One scheme which does this is know as "Kaiser’s varimax" scheme. It operates by moving each principal component axis so that the projections of each variable onto that axis are either near the extremities (a loading value of plus or minus one) or near the origin (a loading value of zero). This sometimes eases the interpretation of the data in terms of the original variables.