I measured a fitness correlate of an animal (response variable) and want to investigate the effect of the following predictor variable: gender (male/female), temperature (9, 12, 15, 18, 21, 24 °C), population (four different populations), parasite infection status (control, parasite exposed but not infected, parasite exposed and infected). The normal way would be calculating a GLM on the response variable and including all predictor variables as main effects. Subsequently, I would check the residuals for normality (Q-Q Plot) and if the residuals are approximately normal distributed I’m done (if not I would Box-Cox transform the response variable and start from the beginning).

But it is not that easy… At least sometimes the response variable seems to be not linear over temperature but a curve with the highest fitness at 15 °C and lower fitness at lower and higher temperatures. To account for this, I want to include an additional quadratic term (temperature*temperature) into the model.

Here are my questions:

1. Can I just compare the p-values of temperature and temperature*temperature to figure out if I have a linear or quadratic relationship?

2. Later I want to plot the fitness over temperature for all 24 combinations of the predictor variables (for example for control males of the Population XY). How do I know if I should fit a linear regression or a curve? My data looks like the relationship for some combinations is linear and for others, it appears to be non-linear. But from the GLM I just get one p-value for temperature and one p-value for temperature*temperature…

3. Does my residuals still have to be normally distributed and can I still Box-Cox transform my predictor variable if the residuals are not normally distributed?

Thanks to all of you!!!

Fred.