Post your code or output. One model uses least squares and the other maximum likelihood estimates. They are optimizing two different things, correct?
Let's suppose Y and X are two binary variables. If I do a linear regression model Y=B0 + B1*X and estimate the predicted values for Y, I will get the same results than if I use a logistic regression. Is there an intuitive explanation for this?
Post your code or output. One model uses least squares and the other maximum likelihood estimates. They are optimizing two different things, correct?
Stop cowardice, ban guns!
Here is an example, but other data would give similar results:
tab Y X
| X
Y | 0 1 | Total
-----------+----------------------+----------
0 | 271 45 | 316
1 | 142 42 | 184
-----------+----------------------+----------
Total | 413 87 | 500
. reg Y X
Source | SS df MS Number of obs = 500
-------------+------------------------------ F( 1, 498) = 6.01
Model | 1.38710662 1 1.38710662 Prob > F = 0.0146
Residual | 114.900893 498 .230724685 R-squared = 0.0119
-------------+------------------------------ Adj R-squared = 0.0099
Total | 116.288 499 .233042084 Root MSE = .48034
------------------------------------------------------------------------------
Y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
X | .138933 .0566627 2.45 0.015 .0276055 .2502604
_cons | .3438257 .0236359 14.55 0.000 .2973873 .390264
------------------------------------------------------------------------------
. predict p1
(option xb assumed; fitted values)
. logistic Y X
Logistic regression Number of obs = 500
LR chi2(1) = 5.81
Prob > chi2 = 0.0159
Log likelihood = -326.03428 Pseudo R2 = 0.0088
------------------------------------------------------------------------------
Y | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
X | 1.781221 .4243795 2.42 0.015 1.11665 2.841307
_cons | .5239852 .0542832 -6.24 0.000 .4276981 .6419494
------------------------------------------------------------------------------
. predict p2
(option pr assumed; Pr(Y))
. sum p1 p2
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
p1 | 500 .368 .0527235 .3438257 .4827586
p2 | 500 .368 .0527235 .3438257 .4827586
. tab p1 p2
Fitted | Pr(Y)
values | .3438257 .4827586 | Total
-----------+----------------------+----------
.3438257 | 413 0 | 413
.4827586 | 0 87 | 87
-----------+----------------------+----------
Total | 413 87 | 500
So p1 created with linear model
p2 created with logistic model
p1 = p2
There is actually a dispute about this. The historic argument was that the results of a linear model when you had a binary DV would generate incorrect results. Among economists at least, many have now come to disagree on that point. They argue that so called linear probability models [which is using a linear estimator to estimate a binary DV] are as accurate as logistic regression if you use robust standard errors. I have a serious problem with that based on my training, but it appears to be a common view at least among economists. To some extent it depends on the nature of the specific model you are estimating,[ if most of the estimated probabilities are in a range of .3 to .7 I think the LPM will probably be ok].
Remember that the slopes of a logistic regression is not the same thing conceptually as the slopes in linear regression. The logistic regression deals with changes in the logit not the raw level of the DV.
"Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995
Alex C (02-07-2017)
And yeah note that it really doesn't matter in your case. If both are binary then either way you parameterize your model you're basically going to be saying "let's fit a value when x=0 and another value when x=1".
I don't have emotions and sometimes that makes me very sad.
CowboyBear (02-06-2017)
Tweet |