Thread: what is Generalized Linear Model and how does it differ from General Linear Model?

1. what is Generalized Linear Model and how does it differ from General Linear Model?

The title says it all, I would really like to know what Generalized Linear Model is and how it differs from General Linear Model? I have some data that I want to analyse (Q-Q plots show no normal distribution of data), which I assumed have to be analyzed by non-parametric methods. Since I have two factors and want to see if the interaction between is significant, I couldn't use Kruskal-Wallis on SPSS since it only allowed 1 factor. Someone told me that I then could use Generalized Linear Model if I had more factors and wanted to check for interaction but I want to know really what that model is and if there are other "better" options than that!

2. Re: what is Generalized Linear Model and how does it differ from General Linear Model

A general linear model is a generalized linear model but not vice versa. A generalized linear model can use any number of sampling distributions where as the general linear model works of the Gaussian distribution.

I have some data that I want to analyse (Q-Q plots show no normal distribution of data), which I assumed have to be analyzed by non-parametric methods.
This is not a correct assumption. Linear models assume the population distribution you draw from is normal not the data itself. To check this you must first run the model and then look at a QQ-plot of the error terms (residuals) to detect non-normality, then you make decisions about parametric vs. non-parametric.

If your data does come from a normal population distribution then the parametric test will most likely give you the greatest power, not the no-nparametric tests as they are more flexible and make less assumptions but also give you less power to reject the null hypothesis.

3. Re: what is Generalized Linear Model and how does it differ from General Linear Model

Originally Posted by trinker

This is not a correct assumption. Linear models assume the population distribution you draw from is normal not the data itself. To check this you must first run the model and then look at a QQ-plot of the error terms (residuals) to detect non-normality, then you make decisions about parametric vs. non-parametric.

Ok, I thought I had checked a Q-Q-plot of the residuals, not sure though...How do I do that on SPSS?

Ok, maybe I should tell more about my data. It's count data on river dolphins. However they are not evenly distritubed along the area I surveyed but there are some specific habitats that seem to be more "preferred". And since I surveyed a big area, which I divided into habitats, I got many zero-counts on my data. The survey was firstly done as transects between 1-3 km along the area and due to that, I decided that to analyze the data correctly, I should convert the counts into dolphins per km and THEN check the data for normality etc. to know if I should continue with parametric or non-parametric analysis. I'm a beginner in statistics and find it really hard so I don't know if I'm doing all wrong...

4. Re: what is Generalized Linear Model and how does it differ from General Linear Model

Ok, I thought I had checked a Q-Q-plot of the residuals, not sure though
You'll know if you got a QQplot of the residuals if you did it after you ran the model. Is this the case? I think in SPSS you have to ask for the residuals and this is what you make a QQ plot of.

Ok, maybe I should tell more about my data.
This is usually the best place to start. What do you have is the most important question to ask before you ask what to do about it

And since I surveyed a big area, which I divided into habitats, I got many zero-counts on my data.
This ecological data screams to me that you may want to investigate a Generalized Linear Model probably the Poisson and/or negative binomial distributions and likely account for the zero inflated data. I suggest you look at this LINK as it is one of my favorites for how to's in different stats programs including SPSS.

I'm a beginner in statistics and find it really hard so I don't know if I'm doing all wrong...
I'm no expert in stats myself but have had a great deal of education around it and I still don't know if I'm doing it all wrong Stats isn't really about right or wrong. often there's many approaches and tests that will work with your data. This quote:
Originally Posted by George E. P. Box
Essentially, all models are wrong, but some are useful
by George E. P. Box helps keep perspective on model selection.

We have at least two ecologists that are contributors here on talkstats (unfortunately for you, not them , I think both are in the jungle right now) and hopefully one of them or a more knowledgeable other will give you further direction.

5. Re: what is Generalized Linear Model and how does it differ from General Linear Model

A generalized linear model allows for a different conditional distribution for the response other than the normal distribution. So for a given value of covariates in a general linear model we assume the response is normally distributed - with a generalized linear model we allow it to be any one of a certain type of distribution (poisson, negative binomial, gamma, lognormal, or even normal).

Without knowing more about your data it's hard to speculate which distribution would be appropriate though.

6. Re: what is Generalized Linear Model and how does it differ from General Linear Model

Originally Posted by Dason
A generalized linear model allows for a different conditional distribution for the response other than the normal distribution. So for a given value of covariates in a general linear model we assume the response is normally distributed - with a generalized linear model we allow it to be any one of a certain type of distribution (poisson, negative binomial, gamma, lognormal, or even normal).

Without knowing more about your data it's hard to speculate which distribution would be appropriate though.
yeah ok, I can attach a part of my data so that you understand. The factors would be "lokal (site)" and "habitat" and all that contains "Inia", "Sotalia" or "Dolphins" are the dependent variables. Those are in the unit dolphins/km since I was told that I had to standarize the data in order to analyze it since before that I had the unit "dolphins/transect" but since the transects are of different sizes (the "length" column), that would not be correct.

7. Re: what is Generalized Linear Model and how does it differ from General Linear Model

So, I wanted to first check if there is any significant difference between habitats (I have six different habitat types) with respect to the dolphins found. If so, then I would like to know how to perform some kind of post-hoc test to know WHICH habitats are different. I know how to do this if the data (or actually the residuals) would be normally distributed since I then could simply use the General Linear Model from SPSS and then check the box for post-hoc and get a Tukey test. However, according to me, this is not the case and thus I have to do non-parametric tests instead which I know nothing about but I've heard that Generalized Linear Model on SPSS would be the way to go. However here there are no post-hoc tests so I could not, for example, analyse to see WHICH habitats are different.

The same goes for my other factor, "lokal" since I want to check if there is a significant difference between sites (lokal) when it comes to dolphins. Finally if the interaction between the factors also is significant, i.e. if local have any effect on habitat or vice versa or if it doesn't matter at all.

8. Re: what is Generalized Linear Model and how does it differ from General Linear Model

Aaah, and one more thing that I cannot understand with this program (SPSS). I am pretty sure that I had try the "analyze->descriptive statistics->explore->" and then chosen factor "habitat" and then one of my dependent variables to check for normality via the Q-Q-plot, and I got one plot for the whole dataset. Now that I try to do the same, it seems to have decided that it will show me results for EACH category of my factor (Habitat) and not for the whole dataset...what is going on??? Don't think I have done anything different from before but I need to check the whole dataset for normality of the residuals, not for each of the category!!! Someone please help me before I go nuts!

 Tweet

Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts