Hlm?

#1
I am working with a large data set, n = 1200+ average
course grades culled from over 15,000 students from a school. I ran a regression using average course grade as a
dependent variable and course enrollment as an independent variable and
I got some very nice results (using SPSS curve estimate and removing
the constant, which is theoretically ok given my scenario). The data
has a nice cup down parabolic shape with an amazing R2 value. I would
like to take the next step and add another independent variable,
course content. I have 5 different content areas across the 1200
courses.

Here's the question, can I just run 5 different regression equations
(seems like this might promote family-wise error issues) or is this
now an HLM issue? Or, am I missing something altogether?
 
#3
elaboration

Sure thing . The 15,000 odd students are enrolled in 1200 different courses. Courses have enrollments ranging from 1 student to 34 students. Thus my independent variable is course enrollment and my dependent variable is the average grade achieved by the students enrolled in a specific course.

Naturally, course content varies by subject matter. In this case, the courses are grouped into several different content areas. For example: Mathematics consists of algebra I & II, Trig, Calculus etc. Language Arts consists of English 9, 10, 11, 12 etc.

My goal is to run a multiple regression utilizing both course content and course enrollment as independent variables and average course achievement as a dependent variable. Hence my initial question, does this constitute hierarchical linear modeling or can I simply run a separate regression (with course enrollment as the only independent variable) for each content area.

Many many thanks!!!
 
#4
level 1 independent

What is your independent level 1 variable? It seems if you nest students in courses, both of the IV's you want are level 2 variables.
 
#6
too complicated

What I mean by a level 1 predictor variable is something that describes students. So gender, or age, or something like that.

Since the two variables you are interested in (course enrollment and course content) are level 2 variables (describing the course), I don’t see why you would want to use HLM.

I’d suggest just running a simple OLS regression with your two independents (course enrollment being a continuous measure and course content being categorical).
 
#7
I'd agree with SmilingSara unless you believe that the effect of enrollment on grades may be different across content areas. If so, you'll want to use appropriate interaction terms or else just run 5 separate regressions--with 1200 courses in only 5 areas this shouldn't be a problem.

You also want to consider the choice of model--if your data has a parabolic shape, as you say (I assume you mean where when you plot enrollment against grade) a linear model is probably not the right choice.
 
#8
It took me a few time reading through this to understand. I think you and SmilingSara are talking about two different models.

I would suggest running the HLM with student as level 1 and Course as level 2. It seems likely to me that the students grades are more likely to to correlated within a class than across classes. If you don't account for the decreased variation within a class, then you underestimate the total variance in the model. Standard errors, and therefore P-values, are too small.

If you average the grades within a class so that each class gets only a single average grade, you are throwing away variation, and therefore underestimating it. Those averages have variation. I would suggest running this part as a two level model and then check your covariance parameters to see if you do have random variation at both levels.

But you were asking if courses should be level 1 and content area as level 2? Well, courses are nested within content area, correct? You do need to account for it in the model (and yes, you should include an interaction, it seems).

Whether it is an HLM or a GLM essentially comes down to whether Course Area is treated as a random or fixed effect. Someone else just responded to this question in another thread, and explained it quite nicely, although it slips my mind which thread that was. If I find it again I'll post it here. You can do either. Making it random makes Course Area another level. But as Sara said, if all your predictors are at the course level, you don't necessarily need content area to be a separate level. And you will just get a single parameter estimate for the variation among Course Areas. You won't be able to compare the means across Course Areas. Do you want to test mean differences?

Karen