PCA on forest carbon production

#1
Hey, I am new to this forum and would like to say thanks in advance to anyone who replies to this message. I have tried thoroughly to discover the solution myself over the last couple of days and have become confused. :confused: (edit: think this post may be in the wrong section of the site if so apologies)

I have been trying to discover what kind of statistical technique I should use to figure out how much of a correlation there is between individual drivers and carbon production (Also known as Net Ecosystem Production or NEP) in a dataset of 500 forest sites I am to analyse. (edit: Simply plotting the x axis with carbon production in tonnes per ha and the Y axis with each individual sites particular carbon production driver, will not accurately represent whether or not less influential drivers on overall carbon production such as intensity of forest management, have a correlation to carbon production. This is due to carbon production drivers that are extremely influential such as solar radiance representing the majority of carbon production variance.) Currently I have identified that my proposed drivers behind carbon production are both discrete and continuous variables as shown below with corresponding numbers.

Proposed NEP (Carbon Production) Drivers
1. Precipitation in mm per year
2. Available water capacity in mm
3. Yearly stand net solar radiation in W.m-2
4. Soil moisture
5. Nitrogen density in g/m2
6. Species age
7. Mean annual temperature in Celsius
8. Number of Dominant Species
9. Nutrient availability (scale of 1-5)
10. Dominant species ( represents 100%)
11. Co-dominant species (represents 50% of the site)
12. Stand Management (type)

13. Net Ecosystem Productivity (NEP or Carbon Produced)

Data Type
1. Ratio (continuous)
2. Ratio (continuous)
3. Ratio (continuous)
4. Ratio (continuous)
5. Ratio (continuous)
6. Ratio (continuous)
7. Interval (continuous)
8. Interval (continuous)
9. Ordinal (discrete)
10. Nominal (Discrete)
11. Nominal (Discrete)
12. Nominal (Discrete)

13. Ratio (continuous)

If I was trying to find the correlation between management type and carbon production am I right in thinking by conducting a Principle Components Analysis (PCA) I will be able to eliminate the variance caused by the more influential drivers behind carbon production such as solar radiance and precipitation? Am I also right in thinking I could do this process for all of the different factors. Basically removing the variance caused by all other NEP drivers to show if there is a more robust correlation between one known driver and NEP through a PCA.

Once the analysis has been completed I want to compare my results with lab experiments and discuss the known knowledge that these factors have in driving carbon production.

Apart from not fully understanding if PCA is the best method to use for this kind of analysis i have become even more confused after learning PCA can only be reliably used on continuous data and not discrete data. If my assumptions of the suitability of a PCA type anaylse for my desired outcome are correct can anyone possibly advise me to a similar technique for both continuous and discrete data in the same dataset. From what I have read using a polychoric principle components analysis works best for this.

If you have got this far once again thanks for reading :)
 
Last edited:

terzi

TS Contributor
#3
Hi,

I don't think PCA would be the best alternative for you. Since you only seem to be interested in the correlations with Carbon production, you could just use some correlation coefficients, maybe some non parametric ones. Of course, this could not be one with the nominal variables. In order to eliminate the variance caused by the more influential drivers, you can try using partial correlations, which deal with this issues:

http://en.wikipedia.org/wiki/Partial_correlation

Another approach that can be a little more complicated, but maybe easier than PCA, would be to fit a regression model to predict carbon production. A sample of 500 would be good enough to include most of the variables (then isolating its effects) and analyze which ones have the most effect in carbon production:

http://en.wikipedia.org/wiki/Linear_regression_model

I really hope this can be helpful for you