# Logistic regression or GLM?

#### yamaneko

##### New Member
Hi everyone:

I have a question about using logistic regression or GLM with only categorical variables as independent variables.

The dependent variable y is shown as binary coded data where 1=males and 0=females. Each corresponding to an excrement. The independent variables are as follows x1=habitat type (NF,SF,CP,RM,CV), x2=elevation (a,b,c,d), x3=forest road (FoR, FaR, MPR) and X4=region (A,B,C,D,E). I would like to know if there are any differences in distribution between males and females among these variables.
Would there be any type of GLM using only categorical variables as independent variables or would logistic regression be my best option?
Below is the dataset:

y X1 X2 X3 X4
1 NF c FoR B
0 NF c FoR B
0 NF b FoR B
1 SF b MPR B
0 SF a MPR C
0 SF a MPR D
1 CP a MPR D
0 SF a FaR A
1 CV a FaR A
0 CV a FaR A
0 SF a FaR A
0 SF a FaR D
1 SF b FaR E
0 CP a FaR A
0 CP a FaR A
1 SF a FaR B
1 CP a FaR A
0 CP a FaR A
1 NF c FoR B
1 NF d FoR B
1 CV a FaR E
0 SF a FaR E
1 CP b FaR A
1 CP a FaR A
1 CP a FaR A
1 RM a FaR A
1 CP a FaR A
0 CV a FaR A
0 SF a MPR D
1 NF c FoR B
1 NF d FoR B
0 SF a FaR E
0 CV a FaR A
1 CP a FaR A
1 CP a FaR A
1 CP a FaR A
1 RM a FaR C
1 NF a FoR B
1 NF d FoR B
0 NF c FoR B
1 SF a FaR B
1 CP a FaR A
0 CV a FaR A
0 NF a FoR C
0 SF a FaR C
0 CV a FaR E
0 CV a FaR E
0 CV a FaR A
0 CV a FaR C
0 SF a FoR C
1 CP a MPR D
1 SF a FaR E
1 CP b FaR A
1 CV a FaR A
1 CP b FaR A
0 CP a FaR A
0 RM a FaR C
1 SF a FaR C
1 NF d FoR B

#### ledzep

##### Point Mass at Zero
GLM is broad/flexible and can handle different data types.
And Logistic regression is a special type of GLM with Logit Link.

#### yamaneko

##### New Member

To narrow my question: I would like know what kind of GLM with Logit Link can categorical variables be used as independent variables. The only examples that I find are GLMs with measurement variables as independent variables.

#### ledzep

##### Point Mass at Zero

To narrow my question: I would like know what kind of GLM with Logit Link can categorical variables be used as independent variables. The only examples that I find are GLMs with measurement variables as independent variables.
Your response is binary (0/1). Hence, you can use logistic regression for your analysis with all your Xs as dependent variable in your model.
Code:
## If you were using R: it would look like this:
model<-glm(y~factor(x1)+factor(x2)+factor(x3)+factor(x4),data=my.data, family="binomial") ## Here family=binomial is telling that our data is binary.
GLM can fit a wide range of responses (gaussian/normal, binary, count,..). GLM uses something called link functions which links mean with the variance parameters. In case of binomial, the default (and canonical) link is logit, which is log-odds of the probability.

HTH

#### ledzep

##### Point Mass at Zero
The dependent variable y is shown as binary coded data where 1=males and 0=females. Each corresponding to an excrement.
Wait, I think I know where your problem lies.
what is your response variable? Y (1/0) ? But shouldn't gender be dependent variable instead? I wouldn't expect gender to be something of your response.
I think your response is excrement? how do you measure it?
What does Y (0/1) mean? excretion=yes/no or Gender=M/F?

#### yamaneko

##### New Member
Thank you for your response ledzep.

Yes. excrement is the dependent variable. 1= M; 0= F

Ive been studying r lately and was using testing with the poisson family. Something like this: fit<-glm(y1~x1+x2+x3+x4+x5+x6+x7,family=poisson(link = "log"),data=waterpots.

Does it make sense?

Ill also try the binomial family