# Thread: (easy) Maximum Likelihood Regression question...

1. ## (easy) Maximum Likelihood Regression question...

hello everyone. i am trying to get my head around a very easy example that i read somewhere explaining maximum likelihood estimation on the OLS regression context.

now, thanks to this board i now understand that the assumption of normality is on the residuals of a regression equation, and not on the variables themselves. however, i was reading this book where it says that, under the OLS regression model, the distribution of Y is something like:

Y~N(Xb, sigma)

and if we use the normal pdf in this case, set it to 0, substitute the appropriate parameters and differentiate, we'll get maximum likelihood estimates of the b-weights. supposedly, from what i've been reading, only in this case (i guess where the expected value of Y is some linear function of my predictor(s) X and sigma is constant) will maximum likelihood estimates match regular OLS estimates... however (please correct me if i'm wrong) when i say Y~N(something, another thing) aren't i saying that Y follows a normal distribution with with something as a mean and another thing as a standard deviation? and, if this is true, then isnt there a distributional assumption on the dependent variable Y?

i'm very confused, i hope someone can help me clarify this...

2. ## Re: (easy) Maximum Likelihood Regression question...

Y~N(Xb, sigma)
I believe the answer to your confusion is understanding the confusion between the sample and the population. In the population the dependant variable approximates normality where as in the sample you are unlikely to have a normal distribution. Capital Y refers to a population where as lower case y refers to a sample. You have a capital Y so you're refering to the population.

I think this will relieve your confusion but if not let us know.
trinker

3. ## Re: (easy) Maximum Likelihood Regression question...

thank you for looking at my post!

uhmmm.. not particularly. if i am doing a maximum likelihood estimation of my regression weights i need to choose something for the probability density function from which i will derive the log-likelihood equation i plan on differentiating. now, i may be completely off, but i don't quite see why making the difference between sample and population would have any impact on which pdf i choose to find my MLEs. the thing that i dont quite grasp is why, on the one hand, OLS' matrix algebra approximation of the regression weights only matches the MLE weights if the dependent Y follows a normal distribution. why is there a distributional assumption on the MLE approach which is not needed on the OLS matrix approach? i mean, the quick answer is because you need to choose something to solve for the loglikelihood and that something happens to be the normal pdf, but what i dont quite get is then why this same assumption (normally distributed Y's) doesnt apply to OLS' regular matrix approach in order to yield optimal BLUE results...

4. ## Re: (easy) Maximum Likelihood Regression question...

http://en.wikipedia.org/wiki/Gauss%E...Markov_theorem

In the BLUE set up, you really do not require the error/response to be normally distributed.

So from the MLE perspective, the reason why it coincide with the OLS result is that the function you minimize in finding the OLS, the residual sum of squares

also appear in the likelihood function of normal (the negative of the exponent), and thus by maximizing the likelihood you are minimizing the residual.

But actually I still not sure which part you get confused.

6. ## Re: (easy) Maximum Likelihood Regression question...

Originally Posted by BGM
also appear in the likelihood function of normal (the negative of the exponent), and thus by maximizing the likelihood you are minimizing the residual.
aha! it's this part where i'm getting lost! so i start by sayinug my Y follows a normal distribution Y~N(XB, sigma), right? then i take my normal pdf to the e to the whole thing up there and after i take the log the e dissappears and i'm left with what will later give me pretty much the same equations (i think they call them "normal equations" rite?) i'd need to solve to get the b-weights. where i am getting lost is that, in the MLE setting, on my first step i claim " dependent Y follows a normal distribution with mean XB and constant variance sigma" whereas in OLS regression i do not need to make such claim for Y, only looking at the residuals should suffice. so why is it that i cannot say, for example, "dependent Y follows a poisson or gamma or something distribution", solve for the MLEs of the b-weights and realize that i dont care which pdf i choose, since i'm only interested in the normality of the residuals?

7. ## Re: (easy) Maximum Likelihood Regression question...

When you say something like Y ~ that isn't quite true. What we mean is . We're talking about the conditional distribution of Y given pretty much everything else. We don't know or care about the marginal distribution of Y because depending on what X and beta are the marginal distribution could be really ugly/unrecognizable.

Also if the conditional distribution of Y is something other than a normal distribution then the residuals won't be normally distributed.

8. ## The Following User Says Thank You to Dason For This Useful Post:

will22 (08-27-2011)

9. ## Re: (easy) Maximum Likelihood Regression question...

Originally Posted by Dason
What we mean is . We're talking about the conditional distribution of Y given pretty much everything else.
THANK YOU! now *THAT* makes a lot more sense... stoopid journal articles for not writting things appropriately... because if the assumption is on the conditional distribution and not on the variable itself, then it makes perfect sense why the residuals should be normally distributed to satisfy the assumptions... oh god, thank-you, i've been staring at this for days.

10. ## Re: (easy) Maximum Likelihood Regression question...

by the way... i see there are a lot of introductory books out there on regression, ANOVA, logistic regression and stuff like that for anyone who's getting started on this. however, i can't seem to find many (or any) that focus on maximum likelihood.... and it seems like it's a REALLY, REALLY important thing to learn and understand well if one is to progress further in statistics... do you have any ideas/recommendations about intro books to maximum likelihood estimation? it seems like everybody learns about this thing but i cant find where people are learning about... and there're so many extensions (robust MLE, penalized MLE, weighted MLE, marginal MLE...oh god..)

11. ## Re: (easy) Maximum Likelihood Regression question...

Yes in many places people will try to omit the conditional one to shorten the notations, which can make confusions.

http://en.wikipedia.org/wiki/Generalized_linear_model

By the way if you really want to generalize the conditional distribution of the response from normal to a general exponential family, then you may read about the Generalize Linear Model. In this case if you have different link function and different distribution, then you have different estimates. E.g. the Gaussian regression model you talk about is a special case, in which you use the identity link, and thus you model the mean as a linear function of the regressors.

12. ## The Following User Says Thank You to BGM For This Useful Post:

will22 (08-27-2011)

13. ## Re: (easy) Maximum Likelihood Regression question...

thank you... i am starting to read on those as well but i guess i'm going step-by-step, hehehe...

i would also like to ask you, BGM, the same question i asked Dason. you seem to understand these things very well as well. are there any intro books you know about specifically for maximum likelihood? i'm just appalled as for how much has been written for beginners in regression/ANOVA yet a technique as important as maximum likelihood is so overlooked for people like me who would like to understand the basics of how it works... i can only either find very advanced books geared towards very specific techniques within the MLE family or they mention it in book chapters like "oh, and btw, here's another method of estimation..." but i keep on trying to find an introductory book for maximum likelihood estimation and keep on coming empty-handed