# Appropriate to use time as covariable in ANCOVA?

#### aibing

##### New Member
Hello stats people,

I need to assess whether the rate of increase (in soil carbon) differs significantly between two groups (forests). I believe ANCOVA is appropriate for this, but am not sure.

Independent variable: Forest
Dependent variable: Soil carbon
Covariable: Time since planting (months)

I'm concerned that a discrete variable like time (my data points are almost annual, so time may as well be integers) isn't an appropriate covariable. Or perhaps there's a more appropriate test than ANCOVA?

Thanks for any advice! (Apologies if this question is obvious. This is my first stats application in independent research, and I haven't been able to find the answer in my books or online.)

#### Jake

There is nothing wrong with having an integer covariate.

#### GretaGarbo

##### Human
I was about to say that try to avoid rounding because that will only increase “uncertainty”. If you have data on the day that the measurement was made, don’t round that to the quarter or month. That will only create measurement errors in the independent variable and in regression create biased estimates.

But then I thought that you might have several variables about what kind of forest there was. Like Spruce, pine tree or different leafy trees.

Then I thought about what is cause and effect.

If you had randomised and assigned a certain area to have spruce tree and waited 50 years (lol) and come and measured the soil carbon, then soil carbon would have been the dependent variable and the forest the independent variable because that what you had done 50 years before.

But now there might be natural selection and competition between trees so that a certain soil fits a certain tree best. Then we have a reverse causality and it is the tree that is the dependent variable and soil is the independent.

This might be something to think about.

Maybe something could be done with simultaneous equations about this, I don’t know. Suggestions?

#### aibing

##### New Member
Hi, thanks for your thoughts. If you're interested...yes, plant/soil biogeochemistry interactions definitely work both ways (hence, interaction), so things can get tricky. Thus, there's a lot of research into mechanisms right now (for example, how nutrient deficiency affects plant growth/invasion vs how roots can alter the chemistry around them to extract nutrients). Fortunately, I'm looking at <10 year old experimental forests, so the study can focus on how tree species affects soil properties. No mechanisms involved at this point...just observation of the end result.

#### GretaGarbo

##### Human
@victorxstc. What you are saying is about what I was thinking to, but you give a better and richer picture than what I thought of. As often when you write about a biological area you give interesting ideas. I think what is lacking is a stochastic specification so that the verbal model were written in concrete mathematical terms, so that the model can be discussed and estimated. (And that is something I think that we together in this community should do.)

Let Y1 be soil carbon, Y2 be forest and x1, x2, be explanatory variables

Aibing suggest the model:

Y1 = a1 +a2*Y2 + a3*x1 + a4x2 …. + error1

The reverse causality suggest:

Y2 = b1 + b2*Y1 + b3*x1+ b4*x2+…+ error2

These two equations can be true simultaneously.

I wonder if this connects to the structural equations models (SEM)?

Edit:
I just noted aibings latest comment.

#### victorxstc

##### Pirate
Thank you so much dear Greta That was so kind of you

Your model is a relief to me! It is interesting and I would love to know which type of test can solve such a equation set? I should go dig this SEM thing. I once read about it but to no avail. Should we just run a correlation coefficient between them? or what? SEM?

ps. Another scenario might also happen, in which the exhaustion of soil nutrients can't be compensated by the leaves fallen from the trees (because those leaves don't have the essential materials and have something else possibly good for another type of tree or live being), and there might be at least two other options (without human intervention): either the trees die or a second and third or fourth (etc.) set of trees start to exploit the weakened condition of the dominant trees (while enjoying the new type of soil [enriched by new nutrients from the leaves of the old trees] which can suit these new trees) and grow. Then either the old trees die, or as the second option, those new trees give the soil what was needed for the growth of the first group of trees, and the first trees' fallen leaves can enrich the soil for the new trees in return. Then a new symbiosis would occur among all those trees and a dynamic stability will start to run. But that doesn't affect this mathematical model of yours considerably, since in that model All trees are called forest (Y2), and you have already encapsulated different types of trees in one variable.

Fortunately, I'm looking at <10 year old experimental forests, so the study can focus on how tree species affects soil properties. No mechanisms involved at this point...just observation of the end result.
You mean the trees are planted by human? and that natural selection has nothing to do with these forests? If so, then the trees might be the independent variable (at least in the first few years, before the natural selection kicks in).

But I am still curious to solve the more sophisticate puzzle in which both of these variables are both dependent and independent at the same time

Maybe something could be done with simultaneous equations about this, I don’t know. Suggestions?
It was interesting Maybe it can be extended too. Even in the second scenario, the soil was not only the independent variable, because trees which have won the competition of natural selection (and have chosen successfully that specific soil), do that selection based on the nutritional materials in that specific soil, and after some years, the soil (which has given the chance to a specific tree to grow and win the natural competition) will be depleted of those materials which favored the selected tree in the beginning (meaning that the trees are changing the soil, the soil becoming the dependent variable again). And after some time, the fallen leaves of the trees will add some specific nutritions back to the soil (which again changes the soil, making the soil the dependent variable).

And after some time (which doesn't need to be long), these two variables get into a stable cycle altogether, each feeding the other one (the soil enriched again by the leaves fallen from the trees feeds the trees and the trees continue to enrich the soil), and each variable being both dependent and independent variables at the same time. So I for one can't decide which is the dependent and which is the independent (and I believe that they are each both).

#### aibing

##### New Member
You mean the trees are planted by human? and that natural selection has nothing to do with these forests? If so, then the trees might be the independent variable (at least in the first few years, before the natural selection kicks in).
Yes, the trees were planted 6 years ago, and soil carbon has been measured annually (minus a few skipped years). "Natural selection" (invasion and competition) have already altered some of the forest plots to some degree, but that is not something i will account for. Another study will assess species competition metrics. The dominant species in each plot is still the initial planted species, so I still qualify the forests as "species x" and "species y" forest. The initial soil conditions in all plots were similar, so I am looking at tree species effect on soil.

#### GretaGarbo

##### Human
Was there any randomisation when it was decided for each plot what species it would have?

How many plots per species and how many species do you have?

#### noetsi

##### Fortran must die
I would argue that time is really not a variable. Something is happening in that time, and that is what you are really measuring. The months between are simply an interval in which some effect is occuring.

But that is probably a silly observation

#### GretaGarbo

##### Human
But was the study randomised? That is what really matter!

Edit:
To randomise something means that a certain plot was chosen to get a specific tree by random number (from a table of random numbers).

It is like tossing a coin if a plot will have tree A or tree B.

This is a fundamental characteristic of experiments.

Last edited:

#### victorxstc

##### Pirate
I would argue that time is really not a variable. Something is happening in that time, and that is what you are really measuring. The months between are simply an interval in which some effect is occuring.

But that is probably a silly observation
Yes you are right IMHO, shouldn't he/she run a 2way ANOVA?

Besides, can someone tell me if the time is our covariate in the ANCOVA, which two regression lines are being compared by that ANCOVA?

#### aibing

##### New Member
Was there any randomisation when it was decided for each plot what species it would have?

How many plots per species and how many species do you have?
Hi GretaGarbo: Yes, there were 3 randomly selected plots for each of the 2 species. To summarize the dataset:
Species: 2 (independent variable)
Soil carbon: measured (dependent) variable; measured in each plot 5 times over a 6 year period (approximately annual, but not exactly, with 2 missing years)

And to noetsi and victorxstc: I'm interested in the rate of change in soil carbon over time, or the correlation between time and soil carbon, so I believe that time is a variable. I also don't think a 2-way ANOVA is appropriate, as it requires 2 nominal variables. In my data set, time is not a nominal variable, especially since the samples weren't taken at regular intervals. I suppose I could lump the times into "annual" nominal variables, but that would sacrifice some precision...

I was planning to do two tests:
1) simple linear correlation and regression of carbon (for each species individually) versus time to see if there is a significant correlation between time and carbon (again, for each species individually). Although I'm still not sure this is a legitimate test either, due to possible autocorrelation between data points over time. I understand this is commonly overlooked in ecological studies, but I'm still not sure what else to use (see thread here: http://www.talkstats.com/showthread.php/28474-Linear-regression-amp-correlation-for-time-series-data).
2) ANCOVA to determine weather there is a significant difference in time:carbon relationship (i.e. regression slope) between the 2 species. To victorxstc: the 2 regression lines compared in the ANCOVA would be the time:carbon slopes of each of the 2 species.