# Thread: lm() and aov() for ANCOVA

1. ## lm() and aov() for ANCOVA

Can anyone explain to me the difference between using lm() and aov() for ANCOVA? Various web sources suggest that they are the same analysis, differing only in the format of the output, but contrary to this, I seem to get substantially different output using them:

Code:
``````> fit = aov( TRANSFER3 ~ LEARN*COND, data=D )
> summary( fit )
Df Sum Sq Mean Sq F value   Pr(>F)
LEARN        1 1.3445  1.3445  39.381 4.49e-08 ***
COND         1 0.2301  0.2301   6.740   0.0119 *
LEARN:COND   1 0.0009  0.0009   0.026   0.8729
Residuals   59 2.0143  0.0341
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1
> ancova = lm( TRANSFER3 ~ LEARN*COND, data=D )
> summary( ancova )

Call:
lm(formula = TRANSFER3 ~ LEARN * COND, data = D)

Residuals:
Min      1Q  Median      3Q     Max
-0.4879 -0.1388 -0.0028  0.1172  0.3651

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)       -0.01813    0.12061  -0.150    0.881
LEARN              0.88106    0.16156   5.454 1.02e-06 ***
CONDBeakers       -0.09431    0.21567  -0.437    0.663
LEARN:CONDBeakers -0.04230    0.26338  -0.161    0.873
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1

Residual standard error: 0.1848 on 59 degrees of freedom
Multiple R-squared: 0.4389,     Adjusted R-squared: 0.4103
F-statistic: 15.38 on 3 and 59 DF,  p-value: 1.66e-07``````
LEARN and TRANSFER3 are both accuracy scores ranging from 0% to 100%. COND is Training Condition, with two levels: Beakers and Symbols. It looks like the first analysis tells me there is an effect of condition after accounting for the covariate (LEARN), while the second one (unless I'm misinterpreting it) seems to say no such effect exists. Which one is right? Or are they not actually inconsistent?

2. ## Re: lm() and aov() for ANCOVA

Scratch that:

When you do a summary on an aov class object it gives you the ANOVA output

Code:
``````# Assuming equivalent models
anova(lm-object)
summary(aov-object)

summary.lm(aov-object)
summary(lm-object)``````
You should see the same outputs here.

3. ## The Following User Says Thank You to bryangoodrich For This Useful Post:

baixiwei (05-29-2012)

4. ## Re: lm() and aov() for ANCOVA

Thank you! That seems to sort things out. So I guess that summary (for ANOVA objects) or anova (for lm objects) is what I should look at to determine whether the main effect of COND and the COND*LEARN interactions are significant, right? Under what circumstances would I want to look at the summary (for lm objects) or summary.lm (for ANOVA objects) outputs? Thanks again!

5. ## Re: lm() and aov() for ANCOVA

The theoretical basis of the different outputs is that one reflects sequential ("type 1") tests of effects while the other reports marginal ("type 3") tests of effects. In your original output the aov summary is sequential while the lm summary is marginal. In balanced factorial designs these are equivalent, but they are not in general, and I tend to prefer the marginal effects.

6. ## The Following 3 Users Say Thank You to Jake For This Useful Post:

baixiwei (05-29-2012), bryangoodrich (05-29-2012), trinker (05-29-2012)

7. ## Re: lm() and aov() for ANCOVA

Thanks! Going with what you said (@Jake), I loaded car and ran

Code:
``````D=droplevels( subset( ws.data, COND!="Rotations" ) )
fit = aov( TRANSFER3 ~ LEARN + COND + LEARN:COND, data=D )
ancova = lm( TRANSFER3 ~ LEARN + COND + LEARN:COND, data=D )
summary( fit )
Anova( ancova, type=2 )
summary( ancova )
Anova( ancova, type=3 )``````
The first two gave me (almost exactly) the same result, and the second two also gave me the same result (but different from the first two). So, two more questions, if you're feeling up for it -

1. any difference between Type 1 and Type 2 here? It looks like when I ask for Type 2, it's giving me the same result as the original aov summary which you said was Type 1.

2. what, if any, are the circumstances under which you'd prefer the first of the two summaries above (Anova( ancova, type=2 ))? I confess to preferring this one because it gives me the results I want - of course I know that's not a good reason - but also, "sequential" seems to better describe what I'm looking for in this case - I don't care about the effect of LEARN as such, but I want to know whether COND has an effect once LEARN has been "factored out", which is - if I understood correctly - the question answered by looking at the sequential effects.

I should mention that my data are not balanced (by COND), although they're very close. And of course they're also not balanced by the covariate, LEARN.

Any good general reference for learning more about these different types of effects and how to choose among them?

8. ## Re: lm() and aov() for ANCOVA

Originally Posted by baixiwei
1. any difference between Type 1 and Type 2 here? It looks like when I ask for Type 2, it's giving me the same result as the original aov summary which you said was Type 1.
A discussion of different ways to compute sums of squares can be found HERE about halfway down the page.

Given how your models are specified, the tests for COND should indeed be identical using type 1 and type 2 sums of squares. That is, in each case the effect of COND is adjusted for LEARN but not the LEARN:COND interaction. However, the test of LEARN should not in general be identical using type 1 and type 2 sums of squares. Can you verify that this is indeed the case in your data?

Originally Posted by baixiwei
2. what, if any, are the circumstances under which you'd prefer the first of the two summaries above (Anova( ancova, type=2 ))? I confess to preferring this one because it gives me the results I want - of course I know that's not a good reason - but also, "sequential" seems to better describe what I'm looking for in this case - I don't care about the effect of LEARN as such, but I want to know whether COND has an effect once LEARN has been "factored out", which is - if I understood correctly - the question answered by looking at the sequential effects.
Some people prefer type1/type2 sums of squares because of the fact that main effects are then tested without adjusting for the interaction terms. This may be desirable in theory because (a) in the presence of an interaction, "main effects" are not really main effects--the answer you get depends to a certain degree on how the predictors are coded, although there are sensible coding conventions; and (b) some feel that it is better not to adjust for the interaction terms because when you do so the null/nested model typically doesn't make a lot of sense, and this supposedly makes the entire question, well, questionable (e.g., see section 5.1 HERE for a very strong stance on this). I am not fully moved by either of these arguments. My basic responses are (a) neglecting to include the interaction term in the model does not magically make the interaction "go away," so I advise that one ought to face up to the interaction by using a sensible coding scheme and then just interpreting it in the model, and (b) there are many situations in which the null/nested model in a model comparison is not of any inherent interest or usefulness, so I fail to see what is special about the case of interaction terms. A consistent application of this latter stance would seem to preclude a great many model comparisons.

I leave it to you to decide what to do with this information, but for what it's worth, my impression is that marginal/type3 tests are what most people are accustomed to thinking about and reporting. If this is what you decide to go with, make sure you code your factors sensibly. Remember, friends don't let friends use dummy codes (when there is an interaction). (Or, as one of my professors would say, dummy codes are named after the people who use them.)

9. ## The Following User Says Thank You to Jake For This Useful Post:

baixiwei (06-03-2012)

10. ## Re: lm() and aov() for ANCOVA

Sorry for the slow reply here ...

Given how your models are specified, the tests for COND should indeed be identical using type 1 and type 2 sums of squares. That is, in each case the effect of COND is adjusted for LEARN but not the LEARN:COND interaction. However, the test of LEARN should not in general be identical using type 1 and type 2 sums of squares. Can you verify that this is indeed the case in your data?
Hm, I poked around a little and couldn't immediately find a way to get the type 1 sum of squares ... e.g. Anova() in car only provides options for types 2 and 3. So I couldn't immediately verify this - any ideas how I could check it?

Regarding the coding thing, I'm not sure I get it, but I think I've asked you about it in another thread so I won't repeat my question here.

Thanks again!

11. ## Re: lm() and aov() for ANCOVA

The anova function (with a small 'a' -- this is actually a different function than Anova with a big 'A') uses Type 1 by default, so check the output of that.

12. ## Re: lm() and aov() for ANCOVA

Originally Posted by Jake
The anova function (with a small 'a' -- this is actually a different function than Anova with a big 'A') uses Type 1 by default, so check the output of that.
Ah, ok, I knew Anova was different from anova but assumed incorrectly that they had the same defaults for sums of squares. OK, I've now run it with anova (i.e. type 1) and Anova (type=2) and you're right, the effects for COND and LEARN:COND were the same, but the effect for LEARN was different between the two.

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts