mmercker

Member
Hi,

using a GLM model (e.g. a Poisson regression), there are various link functions available in statistical packages. However, I don't find an unimodal depedency ("Normal/Gaussian link"), and I always have to approximate an unimodal relationship with polynomial terms. Does somebody know why a "normal link function" is apparently not frequently used? I mean especially in biology, such unimodal relationships are pretty frequent, (e.g. number of species depending on evironmental variables).

I would be glad about any suggestions

GretaGarbo

Human
I don't think I understand what mmerker is asking for or what she mean by "unimodal dependency". An example might make it more clear.

But for the normal distribution the identity link is almost always used, i.e. the "usual linear regression". For binary data it is common to use the probit function (which is the standard normal distribution function), or the logistic function or the complementary log-log function.
But I realize that mmerker is asking for something else.

mmercker

Member
Hi, I am sorry, give me a second trial: Let's assume we have Count data, so the best choice for the stochastic part of the regression model is the Poisson distribution. Now I have to choose the link function; the inverse or the link function reflects how the outcome Y (counts) increases with a continous predictor X. E.g. in the case of count data, often a log-link function is chosen, since the outcome in some cases growths exponentially (which is the inverse log function) with a predictor. In Biology, however, often the outcome (e.g. species number) does not increase linearly or exponentially but in a Gaussian manner with a predictor: E.g. for a environmental variable, there is a peak of species number at a certain value, and a decrease in both direction (think of number of bacteria in water, the temperature is the predictor, there are many bacteriy at 37°C but probably not many at -20°C or 100°C). To descibe this relationship, I woud need an "inverse normal function" as link function. Does this exist? In some R-packages I find "inverse" as a possible choice for the link function, but is this really the inverse normal function?

GretaGarbo

Human
I think of it as:
The expected value (mu) of the dependent variable is linked to a linear predictor (beta*x) with a link function g(mu) as:

g(mu) = beta*x

and

mu= h(beta*x)

and for the log link often used in a Poisson model:

log(mu) =beta*x, mu = exp(beta*x)

And if there is a number of bottles with bacteria, each stored in different temperatures (T) around 37 degrees C, all having the same time for growth, say 24 hours.

That could be modelled (with a quadratic approximation) like:

mu = exp(b0 + b1*(T-37) + b2*(T-37)^2 )

Let fi be the standard normal density. I believe that mmerker want to model the relationship as:

mu = exp(fi(T-37)

So that the second order approximation is replaced by the normal density.

But isn't this what they are doing in the GAM generalised additive models? At the moment I don't remember this so well.

mmercker

Member
Thank you already GretaCarbo, this discussion starts to be very interesting. What I finally want is

$$\mu =\frac{1}{\sqrt{2 \pi b_1}} \exp{( \frac{(X-b_2)^2}{2 b_1} )}$$

with regression parameters $$b_1,b_2$$.

With your formulas you show that we already come close to it with a Poisson GLM and log-link, which is very nice. As you show, in this case we can get

$$\mu = \exp( b_0 - b_1 (T-37) + b_2(T-37)^2)$$,

However, we have to center variables around it's mean value before and using only the intercept and the quadratic term, am I right? In this case we this would resemble a normal distribution.

In GAMs they do something different, they use smoothers (e.g. LOESS or splines) but no parametric functions

Last edited:

GretaGarbo

Human
Of course the normal distribution is not the only distribution. Many other distributions can be used - for example the gamma density. That would allow for much more flexibility, skewness and kurtosis can be included.

There is nothing that says that the response function (of e.g. the number of bacteria in different temperatures) should be such that the integral sums to one. One could use a scaling constant like:

mu = exp(a+K*f(T))

where K is the scaling constant and f() is the used density and a an "intercept".

I guess that such a model can be estimated with non-linear regression.

But I believe that mmerker is talking about something that is quite common in practice - but seldom discussed. Maybe some made up data, inspired by real data, could make this more clear.

GretaGarbo

Human
Is this what mmerker means?

It seems to be common in biological population growth models to assume a logistic or as here a Gompertz model:

y = A + C*exp(-exp(-(b1*(time-M))))

where A is the initial value, C+A is the upper asymptot, b1 is a slope parameter and M is the time where growth is maximum. M and b1 har the role of an "intercept" and a "slope".

But with varying temperatures (Temp) - which here serves the role of mmerkers environment variable - one can use a conventional model

y = A + C*exp(-exp(-(a + b1*time +b2*Temp+b3*Temp^2 ))))

Is this what mmerker wants:

y = A + C*exp(-exp(-(a + b1*time +b2*f(Temp; xbar,s) ))))

where f() is the normal density (or an other density) and xbar and s are parameters to be estimated.