I've got a question concerning a regression analysis which I'm about to perform.

The dataset contains pupils from six different years in a specific school district, ~500 pupils per year. The same individual can not appear two times in the data, but it is the same schools (roughly) that are studied during the different years. The data is, as far as I know, a pooled cross sectional data. The dependent variable is a test result, on a continous scale. The independentant variable is a dummy variable (attending / not attending to a specific course in school). The control variables are gender, education level of the parents, etc.

My approach to study the effect of attendance to the course is to perform an OLS regression analysis. I'm assuming that the standard errors should be clustered on school level, to adjust for possible correlation between pupils from the same school (since there could be some sort of "school culture" which influences the test results, the test contains quite open questions and perhaps some schools are harder in their judgement than others). I'm also thinking of including year dummies to control for possible factors that are influencing the results for all pupils during one year. But I also believe that there has been inflation in the test results over the years, so I'm wondering if I should include a trend variable instead of year dummies?

Is this a good way to do the analysis? Are there any obvious drawbacks?

The dataset contains pupils from six different years in a specific school district, ~500 pupils per year. The same individual can not appear two times in the data, but it is the same schools (roughly) that are studied during the different years. The data is, as far as I know, a pooled cross sectional data. The dependent variable is a test result, on a continous scale. The independentant variable is a dummy variable (attending / not attending to a specific course in school). The control variables are gender, education level of the parents, etc.

My approach to study the effect of attendance to the course is to perform an OLS regression analysis. I'm assuming that the standard errors should be clustered on school level, to adjust for possible correlation between pupils from the same school (since there could be some sort of "school culture" which influences the test results, the test contains quite open questions and perhaps some schools are harder in their judgement than others). I'm also thinking of including year dummies to control for possible factors that are influencing the results for all pupils during one year. But I also believe that there has been inflation in the test results over the years, so I'm wondering if I should include a trend variable instead of year dummies?

Is this a good way to do the analysis? Are there any obvious drawbacks?

Last edited: