# Thread: ANOVA with multiple imputed Data

1. ## ANOVA with multiple imputed Data

Hello

I used multiple imputation on my data to get a complete data set. I want to do a ANOVA now. Does anybody know how to do that correctly?

SPSS calculates ANOVAS for every single imputation group but does not pool the results. Some of my imputation groups are significant (e.g. 0,04) and some aren't (e.g. 0,07).

There is some small literature about pooling multiple imputed data but I don't understand it...(https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4029775/)

Froop

2. ## Re: ANOVA with multiple imputed Data

The process is actually much easier than you probably think, based on Rubin's approach.

You average the estimates from imputation based analyses and that gets you the estimate value across imputes.

Then the SE within and between imputations based analyses values. The section in your link about single pooling covers this. So the estimate is super easy to get, then you create the SE based on within and between imputation variability. This makes the SE measure a little larger since it takes into account the slight variability accounted for between imputes, since it is probability based.

3. ## Re: ANOVA with multiple imputed Data

SE = standard error?

So I just calculate the average of "everything"?^^

Thanks!

4. ## Re: ANOVA with multiple imputed Data

Originally Posted by froop91
SE = standard error?

So I just calculate the average of "everything"?^^

Thanks!
Not everything. The only thing you can "average" (as in just taking the mean) are the parameter estimates you obtain (regression coefficients, correlation coefficients, etc.). That would be 'Q' on your link in Eq (1). Then you need to calculate the within- and between- imputation variance, get the test statistic manually, etc.

Overall its is quite a drag to do. But I just wanted to remind you that this is not as easy as just "taking the average of everything" or "averaging the datasets and running the analysis on it". It's a little more complicated than that.

5. ## Re: ANOVA with multiple imputed Data

Ok, then i have to look how I can do within- and between- imputation variance.

thx a lot

6. ## Re: ANOVA with multiple imputed Data

Just to be sure i try to make an example:

Original Data:

Participant 1: 5 4 3 -
Participant 2: - 2 1 1
Participant 3: 1 2 4 -

Imputation 1:

Participant 1: 5 4 3 2
Participant 2: 2 2 1 1
Participant 3: 1 2 4 3

Imputation 2:

Participant 1: 5 4 3 1
Participant 2: 2 2 1 1
Participant 3: 1 2 4 1

So I average the Imputations

Participant 1: 5 4 3 1,5
Participant 2: 2 2 1 1
Participant 3: 1 2 4 2

And based on this averaged imputation sheet I calculate within and between variance (by hand)? If this is correct, how can I tell SPSS to average the imputations? As far as I know, SPSS keeps all imputations separated and only gives pooled results on some calculations like "frequency" but doesn't pool the data itself)

Is it better to import my data to excel to be able to calculate properly or can do Spss all the calculations?

thx for help and sorry for the question but I am still confused in the topic

7. ## Re: ANOVA with multiple imputed Data

You average your estimates, I am guessing you don't have that many impute sets, so just put them in a new data frame and ask SPSS to average.

As, Spunky reiterated the variance part requires the formula in the paper. Yes, SE = standard errors.

8. ## Re: ANOVA with multiple imputed Data

Originally Posted by froop91
Just to be sure i try to make an example:

Original Data:

Participant 1: 5 4 3 -
Participant 2: - 2 1 1
Participant 3: 1 2 4 -

Imputation 1:

Participant 1: 5 4 3 2
Participant 2: 2 2 1 1
Participant 3: 1 2 4 3

Imputation 2:

Participant 1: 5 4 3 1
Participant 2: 2 2 1 1
Participant 3: 1 2 4 1

So I average the Imputations

Participant 1: 5 4 3 1,5
Participant 2: 2 2 1 1
Participant 3: 1 2 4 2

And based on this averaged imputation sheet I calculate within and between variance (by hand)? If this is correct, how can I tell SPSS to average the imputations?
NO! This is *exactly* what we warned you not to do! What you should be doing looks more like:

Original Data:

Participant 1: 5 4 3 -
Participant 2: - 2 1 1
Participant 3: 1 2 4 -

Imputation 1: <---- RUN ANOVA HERE, GET PARAMETER ESTIMATES (WE'LL CALL THEM Q1)

Participant 1: 5 4 3 2
Participant 2: 2 2 1 1
Participant 3: 1 2 4 3

Imputation 2: <---- RUN ANOVA HERE, GET PARAMETER ESTIMATES (WE'LL CALL THEM Q2)

Participant 1: 5 4 3 1
Participant 2: 2 2 1 1
Participant 3: 1 2 4 1

Now you have two vectors of parameter estimates, Q1 and Q2. You average Q1 and Q2 to get the parameter estimates on which you will do your hypothesis tests, you pool the variances and standard errors of Q1 and Q2 to get the correct within- and between- imputation variance and finally you get the F-statistic that you want. Everything by hand following Eq. 1 - 6 of the document you attached.

Notice that, as shown in the example of the article you attached, you'll need to reframe the ANOVA as a multiple regression so you'll need to ask it for the regression equation to get the regression coefficients and R-squared (whose F-test is statistically equivalent to the F-test you get by taking ratios of Mean Squares.

Here's my honest opinion. If you're dealing with missing data switch software programs. SPSS makes things so unnecessarily complicated that it almost makes you wonder why they bothered only giving you half of the missing data routine.

9. ## Re: ANOVA with multiple imputed Data

Just post the output for the m analyses and make Spunky do it for you!

10. ## Re: ANOVA with multiple imputed Data

Originally Posted by spunky

Here's my honest opinion. If you're dealing with missing data switch software programs. SPSS makes things so unnecessarily complicated that it almost makes you wonder why they bothered only giving you half of the missing data routine.
Which one do you prefere then? Stata? R?

You average your estimates, I am guessing you don't have that many impute sets, so just put them in a new data frame and ask SPSS to average.
Unfortunately I have 20 Imputations :/

Actually I try to test 2 effects on 3 outcomes at 3 brands (MANOVA) but I think I just do several ANOVAS.

If I just make a ANOVA of one of the parts, SPSS gives me the following:

Just post the output for the m analyses and make Spunky do it for you!
Would be the easiest but its my exam project so I have to do it^^
Maybe if i switch to a programm that gets it done for me it will get easier...

Usually I'm not that bad at math but these eq. (1) - (6) look some kind of difficult... don't know why... maybe if some1 could give me a small calculation-example... I can do the rest then

Thanks a lot, guys

11. ## Re: ANOVA with multiple imputed Data

I haven't done it with R yet, but I used SAS (i.e., PROC MIANALYZE) and it is as easy as inputting values.

12. ## Re: ANOVA with multiple imputed Data

I use the mice package in R. But I know STATA also has good missing-data handling capabilities so whichever one you think is easier for you I guess.

Originally Posted by froop91
Would be the easiest but its my exam project so I have to do it^^
If this is any exam project, didn't they teach you in school how do to it then before they let you do it yourself? I'm just wondering if maybe you have something on your notes on how do to this stuff and then you won't need to switch software or anything.

13. ## Re: ANOVA with multiple imputed Data

Originally Posted by spunky
If this is any exam project, didn't they teach you in school how do to it then before they let you do it yourself? I'm just wondering if maybe you have something on your notes on how do to this stuff and then you won't need to switch software or anything.
The answer is always yes, despite what many students tell you, barring any crappy for-profit schools and some community colleges where I have seen this happen. That's the minority, though. On occasion, I've seen professors assign a project with the intention of students completing parts as the material is covered in class.

14. ## Re: ANOVA with multiple imputed Data

Originally Posted by ondansetron
The answer is always yes, despite what many students tell you, barring any crappy for-profit schools and some community colleges where I have seen this happen. That's the minority, though. On occasion, I've seen professors assign a project with the intention of students completing parts as the material is covered in class.
Well, when I’ve taught or TA’d I’ve seen one of two things happening, depending on the type of project.

One is you give the students a dataset with the issues/kinks covered in class so you can see if they’re able to recognize them and address them. The other is you let students do their own project with their own datasets and then the kinks and peculiarities of the dataset reveal themselves as the project goes along. When you find yourself in the latter situation is when the students may struggle a little bit more because you can’t possibly cover every single data issue in an introductory class (like how to handle missing data or what to do if you have a truncated variable, etc.) and they get lost trying to figure things out themselves. So I feel like whereas in scenario #1 you just tell the person “go look it up on your notes” in scenario #2, as an instructor, it’s more like “wow, good job for recognizing this as a problem and trying to fix it yourself”. I tend to work in the latter scenario (people are more interested in analyzing their own data than whatever you can give them) and a lot of the material that’s covered in my classes has now changed because of it. But it obviously demands more of you as an instructor because you need to look after as many datasets as people are in your class.

15. ## Re: ANOVA with multiple imputed Data

Originally Posted by spunky
Well, when I’ve taught or TA’d I’ve seen one of two things happening, depending on the type of project.

One is you give the students a dataset with the issues/kinks covered in class so you can see if they’re able to recognize them and address them. The other is you let students do their own project with their own datasets and then the kinks and peculiarities of the dataset reveal themselves as the project goes along. When you find yourself in the latter situation is when the students may struggle a little bit more because you can’t possibly cover every single data issue in an introductory class (like how to handle missing data or what to do if you have a truncated variable, etc.) and they get lost trying to figure things out themselves. So I feel like whereas in scenario #1 you just tell the person “go look it up on your notes” in scenario #2, as an instructor, it’s more like “wow, good job for recognizing this as a problem and trying to fix it yourself”. I tend to work in the latter scenario (people are more interested in analyzing their own data than whatever you can give them) and a lot of the material that’s covered in my classes has now changed because of it. But it obviously demands more of you as an instructor because you need to look after as many datasets as people are in your class.
We gave the illusion of choice in our class. Students could pick any data set they desired, so long as it was from our pool of 4-6 pre-approved sets ... it helped us focus the scope to what we had taught. Somehow students always came into the TA lab hours saying "We didn't do this in class!" Then, I would show them in their notebook or the course notes where we did it. You're a bit more bold since you let them pick any data set they want.

Page 1 of 2 1 2 Last

 Tweet

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts