# Testing for an influence of variation within groups

#### SamHollinger

##### New Member
I measured the phenotype of a plant species at 10 sites in each of two "habitat types" (i.e., "with competition" and "without competition" by another plant). I also measured several abiotic variables at each site. I want to test whether the plant species differs between the two habitat types as a consequence of competition.

I did a PCA on all measured plant traits and a PCA on all measured abiotic variables. Looking at the plant-PC1, there is a difference between the two habitat types. Yet, the sites also seem to differ along the second axis of a PCA including all abiotic variables ("abiotic-PC2"). How could I test whether "habitat type" (competition) is likely to explain (most of) the plant variation I see between the two habitats, and not "abiotic-PC2"?

Of course, answering this question is not only/mainly a statistical problem. Indeed, statistically, it's quite difficult to tease the effects of "habitat type' and "abiotic variation" apart, as they are somewhat correlated.
Still, given that there is ample abiotic variation within each habitat type, my rationale would be that if the abiotic variation is mainly causing plant differences between habitat types, abiotic variation would also be likely to explain plant variation within the same habitat type. But how would I test for this statistically? Here the 2 options I can think of (syntax as written in R):

model<-lm(plant-PC1 ~ habitat.type + abiotic-PC2, data=data)
# Option 1:
anova(model) # Using Sequential (=Type I) Sums of Squares, which is default in R
# Option 2:
library(car)
Anova(model, type="II") # Using non-sequential Type II Sums of Squares, as implemented in the car R-package

# Question: Is there a statistically supported effect of abiotic-PC2?

Both these options seem to 'control' for variation caused by "habitat.type", thus inferring whether the remaining variation can be explained by "abiotic-PC2". Yet, the two approaches give me different results and I don't know which one is statistically the better justified approach, and why, for what I'm trying to test.

Last edited: