Principal component analysis

I am analysing ecological data for relationships using PCA. Data have been grouped according to signs (-/+) of PC1, 2 and 3 that explained 85% of the variance.

Do data that have negative signs need to be transformed (i.e. watertable below the soil surface) and by what method??

Correlation analysis and multiple regression analysis suggest that watertable was negatively correlated with respiration and yet in PCA the results show apposing signs in both PC1 and PC2. Thus does this suggest negative correlations?

Also what are the most important factors to consider: signs, closeness of coefficients to 1, the number of PCs with the same signs??


TS Contributor

PCA is a very advanced analysis method - entire books have been written about the issues you have raised. There could be lots of reasons behind the different signs (I don't think you need to transform) - could be collinearity, interactions, mediator variables, etc.

How much experience do you have using PCA and multiple regression? The nature of your questions suggest that you haven't used the methods before (that wasn't meant to be derogatory - I personally have very little experience with PCA as well - just in grad school).

Before we can lend assistance that is meaningful, we'll need to understand:

What are the underlying theories / hypotheses of your study - what are you trying to accomplish - what is the purpose of the study? -- this is the most important question of all, and will drive you toward the appropriate analytical method(s)

I can't underscore this enough - you need to have a firm research question, data-gathering plan, and analytical plan in place before using a particular method.


Cheers John
You might be surprised, Ive just passed my PhD thesis on peatland biogeochemistry. Had more experience with univariate stats but understand the main principles of multivariate states. At the mo im doing corrections on my chapters. The basis of my research has been to understand the important environmental factors (climatic, physicochemical) that affect carbon dioxide fluxes (photosynthesis, respiration) in peatlands. Field experiments were set up to monitor both abiotic and biotic factors over the course of a year. Originally correlations were performed based on hypothesized interactions between variables. Obviously, correlations are poor stats and require knowledge of dependent and independent variables. Thus PCA was choosen to look for relationships between the variables.

Timeseries plots and stepwise regressions suggested that the depth of the watertable (in negative sign) and temperature interactively affected soil respiration (correlation analysis also strongly suggested this despite its weaknesses). My PCA results agree well with results that i have obtained using correlation and regression analysis however the water table results are clearly separate from respiration in PCA by two PCs. Below i have grouped variables according to the first two PCs. At first i thought it maybe due to the negative sign of the water table data but as you said maybe collinearity with temperature. Do you think these groupings are reasonable?

Variable PC1 PC2 PC3 PC4 PC5
Basal respiration -0.358 -0.029 -0.091 -0.080 0.242
Sulphatase -0.345 -0.144 -0.120 0.036 0.353
Β-glucosidase -0.297 -0.269 0.197 0.206 0.098
DOC -0.275 -0.382 0.054 0.244 -0.357

Temperature -0.361 0.016 -0.060 -0.014 0.280
PPFD -0.023 0.450 0.456 0.376 -0.289
respiration -0.340 0.192 0.003 -0.221 0.036
photosynthesis -0.349 0.126 0.094 -0.135 -0.097

Phosphatase 0.070 -0.287 0.460 -0.750 -0.158
Water table 0.300 -0.258 0.149 0.248 0.343
Rainfall 0.254 -0.301 0.265 0.092 0.378

Eigenvalue 7.3185 2.3827 1.8009 0.6519 0.3281
Proportion 0.563 0.183 0.139 0.050 0.025
Cumulative 0.563 0.746 0.885 0.935 0.960

Many thanks


TS Contributor

First of all, congratulations - that's quite an accomplishment.

From what you say, PCA seems to be the reasonable approach to assessing your theories - unfortunately, this is waaaayyyyy beyond the scope of what we would take on at this site.

The groupings would be reasonable if they "jive" with the theory underlying your dissertation. If you're getting something that doesn't make sense, then you'll need to go back to your theory...

If all else fails, talk to your thesis advisor :D

Best of luck!