Hi everyone!

I'm a researcher within cardiovascular medicine, currently working on a study in endocrinology (diabetes). I have the following challenge, which I have discussed with several staticians who all gave me discordant advice and explanations (which is why I'll try the discussion here).

The data set consists of:
* 400'000 observations
* 100'000 unique individuals (between 1 and 10 observations/individual).
* Varying time between observations.
* Grouping variable is ethnicity; 10 groups (basically world continents).
* A long range of covariates available.
* Outcome of interest is continuous (after log transform not satisfactory normally distributed).

Thus, I have a large dataset with repeated measures with varying time between them. The outcome of interest is blood glucose levels (continuous variable). I would like to find out if/how ethnicity affects this outcome, after adjusting for covariates.

The optimal statistical method for this is difficult to judge.

PROC GENMOD does have a 'repeated' option which invokes the generalized estimated equations put forward by Liang and Zeger. This method has several advantages.

PROC GLM can be used.

PROC MIXED can also be used.

The question is, which method is optimal for this data set and why?

Thanks in advance!