GLM on non-linear/curvilinear data

Dear all! I’m sure somebody of you will help me with my non-linear data-problem.

I measured a fitness correlate of an animal (response variable) and want to investigate the effect of the following predictor variable: gender (male/female), temperature (9, 12, 15, 18, 21, 24 °C), population (four different populations), parasite infection status (control, parasite exposed but not infected, parasite exposed and infected). The normal way would be calculating a GLM on the response variable and including all predictor variables as main effects. Subsequently, I would check the residuals for normality (Q-Q Plot) and if the residuals are approximately normal distributed I’m done (if not I would Box-Cox transform the response variable and start from the beginning).

But it is not that easy… At least sometimes the response variable seems to be not linear over temperature but a curve with the highest fitness at 15 °C and lower fitness at lower and higher temperatures. To account for this, I want to include an additional quadratic term (temperature*temperature) into the model.
Here are my questions:

1. Can I just compare the p-values of temperature and temperature*temperature to figure out if I have a linear or quadratic relationship?

2. Later I want to plot the fitness over temperature for all 24 combinations of the predictor variables (for example for control males of the Population XY). How do I know if I should fit a linear regression or a curve? My data looks like the relationship for some combinations is linear and for others, it appears to be non-linear. But from the GLM I just get one p-value for temperature and one p-value for temperature*temperature…

3. Does my residuals still have to be normally distributed and can I still Box-Cox transform my predictor variable if the residuals are not normally distributed?

Thanks to all of you!!!

Oh dear,

It seem like no one is answering Freds posts. But he has asked about this also in July and in June.

It seems to be about how well some fish are after a randomized? experiment.
Hi Fred,

It's been a crazy busy year for me, so I'm not so active here but as Greta ensured that I noticed this I cant help but give it a shot.

Looking at your previous questions and the above, I can say that the GLMM was fine and it didn't have to exclude all your "control fish from the model since they have missing values". You should just code the control fish parasite population as a separate category say "C" which means zero parasites. You see, it not that you don't have data there you know exactly how many parasites there were right. Zero! That's not missing data!

As for your questions:

1) No, a p-value says little of the overall fit of the model, look at the R2 values or AIC scores at least. If these improve somewhat you have evidence for a quadratic relationship, if these improve hugely you have strong evidence for a quadratic relationship. Then , you could also maybe plot both models against your data and visually confirm the relationship. Finally, a GAM could be used to get an idea of the moving average or functional shape.

This questions seem related to #1. Plot the scatter plot, look at AIC values, maybe fit a moving average with a gam.

3) You are using a GLM. This means that the error distribution does need to be normally distributed depending on exactly which model you are using. Which has me wondering what kind of GLM you are using? Or are you just using a LM?

Hope this helps!


New Member
Hi Greta and Ecologist,

Thanks for your reply!

Luckily my current question is not directly linked to the ones I had in summer – the summer-problems have been solved in the meanwhile… (-;

I want to use a general linear model (GLM) for my analysis. As far as I understood this is a type of a generalized linear model (GZLM) with Gaussian distribution and identity link. Could it be that the abbreviation GLM leads to confusions since in R a GLM is computed with lm() and a GZLM with glm()?

As far as I understood you, the normal distribution of residuals is also required if I add a quadratic term to the model – right?

The GAM procedure seems to be interesting. I have never heard about this before. I did a short web-search and could find some information. Unfortunately it seems not to be implemented in SPSS. Nevertheless, is there still a way to calculate a GAM in SPSS?


Hi Greta and Ecologist,
I want to use a general linear model (GLM) for my analysis. As far as I understood this is a type of a generalized linear model (GZLM) with Gaussian distribution and identity link. Could it be that the abbreviation GLM leads to confusions since in R a GLM is computed with lm() and a GZLM with glm()?
A generalized linear model specifying an identity link function and a normal family distribution is exactly equivalent to a (general) linear model. Unhappy choice of abbreviations in statistical programs may contribute to this, but in general glm will be understood to mean a generalized linear model.

But yes linear models need normally distributed residuals.

SPSS does not seem to have GAMs but alternatives may exist
Hey again and thank you so much so far.

I tried to find some advice how to handle a quadratic term in a generalized linear (mixed) model but information on that are quite rare. That’s why I come back to you with some further questions. Hopefully, somebody here can help me again…
At the moment I am calculation a model with the following variables:

TEMPERATURE (9, 12, 15, 18, 21 and 24 °C), POPULATION (A, B, C and D), GENDER (male and female) and INFECTION_STATUS (control, exposed but not infected, infected)

Since I expect quadratic curves with an optimum somewhere in the middle of my temperature-scale I also add the quadratic term TEMERATURE_SQUARED to my model. In most cases both TEMPERATURE and TEMPERATURE_SQUARED are significant.
Now, coming to interactions, I have some questions:

- What would a significant interaction TEMPERATURE x TEMPERATURE_SQUARED tells me? I just think it would make no sense – that’s why I did not include this interaction into my model. Am I right?

- What about interactions between TEMPERATURE and one of the other variables- for example, POPULATION? Would I need to include TEMPERATURE x POPULATION, TEMPERATURE_SQUARED x POPULATION or both to my model to figure out if my populations react differently at different temperatures? What is the difference between both?

Hope you get my questions…!?

Thanks again for your Help!