Unbalanced study population


New Member

I am conducting an occupational health analysis of two specific occupations. For my total study population, there are 68 individuals in occupation 1 and 1,881 in occupation 2. There is no way to expand the sample in any way. The outcome of interest is a continuous variable indicating stress, so I thought to use linear regression. I will be additionally controlling for four other covariates in the model.

My question is...should I be concerned about the heavily unbalanced study population (i.e., 68 vs. 1,881)? Are there any recommended analysis techniques for dealing with unbalanced populations such as this? I could do a 4:1 (or some other ratio) individual match of occupation 2 on occupation 1, though I'm not sure this is the correct route to go as I typically use this technique when using a dichotomous, not continuous outcome, so I can then use conditional logistic regression.

Any help would be greatly appreciated. Thanks in advance to anyone for their assistance on this matter.

Very respectfully,



Active Member
This is how I see it ...
There should be no real problem with the different sample sizes, as you seem to have plenty of data.
One possibility is a General Linear Model, GLM, which is a sort of combination of anova for your two categorical predictors, and regression for the continuous ones. Depending on your software you can declare the predictors to be continuous or categorical with possibly any interactions you want, and it will sort it all out. Inside the works it will do it by multiple regression using dummy variables for the categories but that will invisible to us humans.