Hi all,

I’m new here, and fairly new to GLMMs. I’m using PROC GENMOD to model a count outcome (number of offspring) in a toxicology study. Offspring count data were collected in 10 equally spaced time points for every individual (subject) in the experiment. Besides “time”, I have a nominal independent variable “treatment”.

So now for the questions:

First: should I consider “time” as a nominal or continuous variable when modeling this? Am I right to think that by doing the former I would be running a generalized linear model analogous to an ANOVA, and the later would be a regression approach?

Time as nominal variable:

proc genmod data = offspring;

class id time treatment;

model y = time treatment time*treatment/type3 wald dist=”distribution”;

run;

Time as continuous variable (regression approach):

proc genmod data = offspring;

class id treatment;

model y = time treatment time*treatment/type3 wald dist=”distribution”;

run;

Now, being the outcome variable a count, I could assume a Poisson distribution, but I already know that the variance is much greater than the mean, causing overdispersion. Given that, what I am thinking to do is first run the model with different distributions, say, Poisson, Poisson with an extra parameter to account for overdispersion, and negative-binomial. Then I would select the best model (distribution) based on AIC (and/or other fit statistics, like DEVIANCE/DF).

Furthermore, because this is a longitudinal study (I took repeated measures over time on the same subjects), I understand I should model the covariance structure in order to take into account the correlation between residuals of a same subject, to get robust standard error estimates. So, must I model the covariance structure, even if the best fitting model (the one which gave the smallest AIC value among models with different assumed dispersions) shows no sign of overdispersion or underdispersion (deviance/df =1)? And if yes, if a covariance structure that assumes independence gives the best fit, does this mean I could drop the repeated statement altogether?

In case the answer to the last question is affirmative, is this a reasonable strategy for model selection?:

1) establish the model statement based on the experiment design and characteristics;

2) select the distribution that gives the best fit (smallest AIC);

3) using the previously selected model (with the best distribution), add a repeated statement and compare competing covariance structures (now based on QIC, since the repeated statement invokes GEEs), selecting the model with the smallest QIC;

4)Make inferences.

proc genmod data = offspring;

class id time treatment;

model y = time treatment time*treatment/type3 wald dist=”best distribution”;

repeated sub=id/type=”covariance structure”;

run;

Thanks so much for any help!

Best,

Nikko

I’m new here, and fairly new to GLMMs. I’m using PROC GENMOD to model a count outcome (number of offspring) in a toxicology study. Offspring count data were collected in 10 equally spaced time points for every individual (subject) in the experiment. Besides “time”, I have a nominal independent variable “treatment”.

So now for the questions:

First: should I consider “time” as a nominal or continuous variable when modeling this? Am I right to think that by doing the former I would be running a generalized linear model analogous to an ANOVA, and the later would be a regression approach?

Time as nominal variable:

proc genmod data = offspring;

class id time treatment;

model y = time treatment time*treatment/type3 wald dist=”distribution”;

run;

Time as continuous variable (regression approach):

proc genmod data = offspring;

class id treatment;

model y = time treatment time*treatment/type3 wald dist=”distribution”;

run;

Now, being the outcome variable a count, I could assume a Poisson distribution, but I already know that the variance is much greater than the mean, causing overdispersion. Given that, what I am thinking to do is first run the model with different distributions, say, Poisson, Poisson with an extra parameter to account for overdispersion, and negative-binomial. Then I would select the best model (distribution) based on AIC (and/or other fit statistics, like DEVIANCE/DF).

Furthermore, because this is a longitudinal study (I took repeated measures over time on the same subjects), I understand I should model the covariance structure in order to take into account the correlation between residuals of a same subject, to get robust standard error estimates. So, must I model the covariance structure, even if the best fitting model (the one which gave the smallest AIC value among models with different assumed dispersions) shows no sign of overdispersion or underdispersion (deviance/df =1)? And if yes, if a covariance structure that assumes independence gives the best fit, does this mean I could drop the repeated statement altogether?

In case the answer to the last question is affirmative, is this a reasonable strategy for model selection?:

1) establish the model statement based on the experiment design and characteristics;

2) select the distribution that gives the best fit (smallest AIC);

3) using the previously selected model (with the best distribution), add a repeated statement and compare competing covariance structures (now based on QIC, since the repeated statement invokes GEEs), selecting the model with the smallest QIC;

4)Make inferences.

proc genmod data = offspring;

class id time treatment;

model y = time treatment time*treatment/type3 wald dist=”best distribution”;

repeated sub=id/type=”covariance structure”;

run;

Thanks so much for any help!

Best,

Nikko

Last edited: