# Very long post - need help thanks

#### ZikO

##### New Member
Dear All,

I am writing to you because I have got a lot of troubles with some concepts regarding Generalised Linear Models. I need to understand it by the January 2012. I hope it is possible.
I am currently getting through one of the books regarding generalised linear models in a pretty much advanced way. I have also access to MATLAB so I can test my ideas in this environment. I don't have much expeirence in statsistics but it does not mean I cannot understand the concept at all. I split this message into a few sections where I outline the problems I am experiencing now. I will be grateful for any comments and suggestions. I am sorry for the length of this message.

Likelihood function
The least problem I have is the Maximum Likelihood Function. I think I have just recently got that right. I have understood I can estimate parameters of a particular probability distribution by finding what maximum probability for input data is, meaning I need to find a maximum of a likelihood function. The way we do this is either finding product or sum of logs of probabilities and analyze this function with respect to unknown parameters. Maximum of this function gives the best estimates and consequently the best parameters estimation. I hope I've got it right.

Expected value
This is the worst nightmare at the moment. For some reason I cannot even "feel" that. All equations in which this terms is involved become hardly to understand. Unfortunately, information matrix also requires this term to understand how to analytically find estimation of probability density function.
I know that for simple random variables, it is weighted average where probabilities corresponding to random variables are its weights. However, in the book I am reading now, Expected values are functions of such forms:
-E(d2L(pi)/dpi2)
for instance for binomial function it is:
-E[d2L(p)/dp2] = E[y/p2 + (n-y)/(1 -p)2]

I don't know how to calculate this. The book does not really explain it. I hope someone could either give me some resources to read or simply explain how to bite it. Thanks

Generalised Linear Models
I would be grateful if some of you could clarify one thing I am struggling with. How do we transfer our relationship between independent and dependent (categorical) variables from simple representation such as y vs x into probability of its occurrence vs x? Does error between real and estimated value of dependent variable has something to do with this? I have read about link function which map a dependent variable into probability.

Linear Regression Line
I've read Linear regression cannot be applied if variation is different along the data. Why does the non-constant variation in data cause this method invalid to apply? Why does Generalised Linear Models fit data better and why they take non-constant variation into account?

I will be grateful for your help. Thanks.

#### Daniel Dvorkin

##### New Member
The key to the expected value problem is not to panic when you look at the whole expression. Look at it and you'll see that it breaks down into fairly simple expressions involving one random variable, Y, and some constants, n and p1 and p2 (where p2 = 1 - p2, or equivalenetly, p1 = 1 - p2). So in the example you gave:

E[Y/p2 + (n-Y)/(1-p2)] = E[Y/p2] + E[(n-Y)/(1-p2)]

= (1/p2) E[Y] + E[(n-Y)/(1-p2)] (because the expected value of a constant times a random variable equals the constant times the expected value of the r.v.)

= (1/p2) E[Y] + E[n/(1-p2) - Y/(1-p2)] (simple algebra)

= (1/p2) E[Y] + E[n/(1-p2)] - E[Y/(1-p2)] (because the expected value of a sum is the sum of the expected values)

= (1/p2) E[Y] + n/(1-p2) - E[Y/(1-p2)] (because the expected value of a constant is a constant)

= (1/p2) E[Y] + n/(1-p2) - (1/(1-p2)) E[Y] (constant multiplication again)

= (1/p2 - 1/(1-p2)) E[Y] + n/(1-p2) (a little algebraic reshuffling to make it easier to calculate)

And E[Y] is easy to get, because it's just a binomial random variable with the given parameters: specifically, E[Y] = n p1. Now you can make the final round of calculations:

(1/p2 - 1/(1-p2)) E[Y] + n/(1-p2)

= (1/p2 - 1/p1) E[Y] + n/p1 (from the relationship defined between the parameters)

= (1/p2 - 1/p1) (n p1) + n/p1 (from the expression for E[Y])

= (n p1 / p2) - n + n/p1 = n(p1/p2 - 1 + 1/p1) (simple algebra.)

If you want to, you can go ahead and substitute 1-p2 for p1 in the above expression and do some further algebraic shuffling; a simpler expression may result. But that's the general approach.

Note that in this particular case, you could have gone ahead and done the substitution of n p1 (or n(1-p2)) for E[Y] at the beginning, because the original expression is linear in E[Y], and it might have made the calculations easier. But be careful -- a lot of second derivatives of log-likelihoods involve expectations of polynomials, E[Y^2] and the like, and of other expressions such as E[log(Y)], so it's easy to make a mistake when you do this. Generally, it's best to leave the expectations in place until you've done all the algebra you can.

Hope this helps!

##### Ninja say what!?!
Likelihood function
The least problem I have is the Maximum Likelihood Function. I think I have just recently got that right. I have understood I can estimate parameters of a particular probability distribution by finding what maximum probability for input data is, meaning I need to find a maximum of a likelihood function. The way we do this is either finding product or sum of logs of probabilities and analyze this function with respect to unknown parameters. Maximum of this function gives the best estimates and consequently the best parameters estimation. I hope I've got it right.
You've got the idea down, but to correct a little of it: using MLE, you're not "finding what maximum probability for input data is". What you're doing is using the data to estimate the parameters that will maximize the probability of getting that data. A way to think of it is that you're estimating the parameters given the data.

Expected value
This is the worst nightmare at the moment. For some reason I cannot even "feel" that. All equations in which this terms is involved become hardly to understand. Unfortunately, information matrix also requires this term to understand how to analytically find estimation of probability density function.
I know that for simple random variables, it is weighted average where probabilities corresponding to random variables are its weights. However, in the book I am reading now, Expected values are functions of such forms:
-E(d2L(pi)/dpi2)
for instance for binomial function it is:
-E[d2L(p)/dp2] = E[y/p2 + (n-y)/(1 -p)2]

I don't know how to calculate this. The book does not really explain it. I hope someone could either give me some resources to read or simply explain how to bite it. Thanks
It looks like the expectation presented is for the second derivative of the log-likelihood of a binomial distribution. A hint for you is that:
[TEX] L(p)\propto y*ln(p)+(n-y)*ln(1-p)[/TEX]
Take the second derivative of that wrt to p.

Generalised Linear Models
I would be grateful if some of you could clarify one thing I am struggling with. How do we transfer our relationship between independent and dependent (categorical) variables from simple representation such as y vs x into probability of its occurrence vs x? Does error between real and estimated value of dependent variable has something to do with this? I have read about link function which map a dependent variable into probability.
I don't understand what you're asking by "into probability of its occurrence vs x". Could you clarify your question?

Linear Regression Line
I've read Linear regression cannot be applied if variation is different along the data. Why does the non-constant variation in data cause this method invalid to apply? Why does Generalised Linear Models fit data better and why they take non-constant variation into account?
There are certain assumptions that you are making when you set up a linear regression. Constant variation is one of them. Ergo, if variation is non-constant, your assumption is violated and linear regression is not appropriate. There are other methods to use, as well as transformations you can consider to be more consistent the assumption. Note: GLMs don't necessarily fit the data better. They can however, with the correct link function.

HTH

##### Ninja say what!?!
dang it!!! I need to start posting faster!

PS. Impressive background there Daniel. Hope you stick around.

#### Daniel Dvorkin

##### New Member Thanks, I'm planning on it.

#### ZikO

##### New Member

Daniel. It is much clearer now . For Bernoulli Distr. I can sometimes read that E[Y] = p or E[Y] = np. Am I correct saying the first is for a single random variable and the second for a sum of n variables defined by the same distribution?

I don't understand what you're asking by "into probability of its occurrence vs x". Could you clarify your question?
I am sorry for not being clear in my questions. It is probably the reason I am not exactly sure what I have to ask about. Sometimes researchers present categorical data of dependent variables in a form of scatter plot, but they also draw S-curves showing cumulative distribution functions expressing probability, perhaps probability that m-nth category occurs given value of x. S-curves are form CDF between 0 and 1 whereas categories can be between 1 and 10 for instance. They say they use for instance logistic regression which is I believe a part of glm functions. Please, I can give some more details if it's still not clear.

Thank you

#### Daniel Dvorkin

##### New Member
For Bernoulli Distr. I can sometimes read that E[Y] = p or E[Y] = np. Am I correct saying the first is for a single random variable and the second for a sum of n variables defined by the same distribution?
Yes, that's exactly right. To state it formally: a Bernoulli random variable of probability p, with E[Y] = p, can take on the value 0 or 1, where p is the probability that Y = 1. A binomial random variable of size n and probability p, with E[Y] = n p, where Y can take on any integer value from 0 to n, is the sum of n Bernoulli random variables of probability p.

#### ZikO

##### New Member
Daniel,

I came back to the book I was reading and I must say things became a lot clearer. Your explanation really helped There is a chapter regarding overdispersion marked as (*). Expected value is written as E[Y|μ]. What do you think it means? The chapter is I believe for those who are much more experienced in statistics but I was wondering what this could mean: Y|μ. If you find time for a few words on that that would be great. Thank you.

#### ZikO

##### New Member
I would be grateful if someone could help me with glm. This is a very important problem I have to understand. Perhaps a link function would be a good start. I Am involved in fitting data to a model and i have been told linear regression is not valid whereas gml is. I know there are a few well known link functions such as normal cdf, logistic, poisson etc. but I don't know when and how I should use one of them.

Cheers.

#### Dason

What software are you going to be using?

##### Ninja say what!?!
What software are you going to be using?
(*chuckles to himself) I'm not sure the OP is ready to implement GLM yet. Though thinking about it, that might be a good way to get experience and a better understanding with it.

#### Dason

That's why I ask - it's going to be very tough to try to understand GLMs without actually seeing them in use and messing around with them.

#### ZikO

##### New Member
(*chuckles to himself) I'm not sure the OP is ready to implement GLM yet
Well, thanks for "believing" in me Link. Perhaps you think I am not educated enough. This thread is actually the reason GML is tough and I am looking for help here. I am not sure how should I respond to someone who speaks about me in third person: "OP", though. You seem to have fun of me ...

What software are you going to be using?
At the moment, I have access only to MATLAB and probably R+ which is for free. Definitely, I would prefer MATLAB.

it's going to be very tough to try to understand GLMs without actually seeing them in use and messing around with them
Can you suggest something to help me to see them in use please? I know it's going to be tough. I guess I have a lot of time so I'm really keen learning it. Thank you

#### Dason

At the moment, I have access only to MATLAB and probably R+ which is for free. Definitely, I would prefer MATLAB.
I'm assuming that you meant R (not R+) which is definitely my program of choice. I don't have any matlab experience so I wouldn't be able to help you there but could definitely help with R.

Can you suggest something to help me to see them in use please? I know it's going to be tough. I guess I have a lot of time so I'm really keen learning it. Thank you
Do you have a specific type of data in mind? The most common for glms are probably either binomial or poisson.

#### ZikO

##### New Member
I have data where an independent variable is continuous but easy to be categorical whereas a dependent variable y is definitely categorical: either 5 or 11 categories. Let's say it's only 5. There is only 1 out of 5 choices for a single case.

#### Dason

You would probably want to keep the independent variable continuous. Can you describe your dependent variable a little more? Is it ordinal or just nominal?

#### ZikO

##### New Member
Thank you Dason.
The dependent variable is categorical. It is semantic scale from 1 to 5 corresponding to wording: "not at all", "slightly", "moderately", "very", and "extremely". It is definitely ordinal.