# Thread: comparing linear vs logistic regression for a simple model with binary variables

1. ## comparing linear vs logistic regression for a simple model with binary variables

Let's suppose Y and X are two binary variables. If I do a linear regression model Y=B0 + B1*X and estimate the predicted values for Y, I will get the same results than if I use a logistic regression. Is there an intuitive explanation for this?

2. ## Re: comparing linear vs logistic regression for a simple model with binary variables

Post your code or output. One model uses least squares and the other maximum likelihood estimates. They are optimizing two different things, correct?

3. ## Re: comparing linear vs logistic regression for a simple model with binary variables

Here is an example, but other data would give similar results:

tab Y X

| X
Y | 0 1 | Total
-----------+----------------------+----------
0 | 271 45 | 316
1 | 142 42 | 184
-----------+----------------------+----------
Total | 413 87 | 500

. reg Y X

Source | SS df MS Number of obs = 500
-------------+------------------------------ F( 1, 498) = 6.01
Model | 1.38710662 1 1.38710662 Prob > F = 0.0146
Residual | 114.900893 498 .230724685 R-squared = 0.0119
Total | 116.288 499 .233042084 Root MSE = .48034

------------------------------------------------------------------------------
Y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
X | .138933 .0566627 2.45 0.015 .0276055 .2502604
_cons | .3438257 .0236359 14.55 0.000 .2973873 .390264
------------------------------------------------------------------------------

. predict p1
(option xb assumed; fitted values)

. logistic Y X

Logistic regression Number of obs = 500
LR chi2(1) = 5.81
Prob > chi2 = 0.0159
Log likelihood = -326.03428 Pseudo R2 = 0.0088

------------------------------------------------------------------------------
Y | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
X | 1.781221 .4243795 2.42 0.015 1.11665 2.841307
_cons | .5239852 .0542832 -6.24 0.000 .4276981 .6419494
------------------------------------------------------------------------------

. predict p2
(option pr assumed; Pr(Y))

. sum p1 p2

Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
p1 | 500 .368 .0527235 .3438257 .4827586
p2 | 500 .368 .0527235 .3438257 .4827586

. tab p1 p2

Fitted | Pr(Y)
values | .3438257 .4827586 | Total
-----------+----------------------+----------
.3438257 | 413 0 | 413
.4827586 | 0 87 | 87
-----------+----------------------+----------
Total | 413 87 | 500

So p1 created with linear model
p2 created with logistic model
p1 = p2

4. ## Re: comparing linear vs logistic regression for a simple model with binary variables

There is actually a dispute about this. The historic argument was that the results of a linear model when you had a binary DV would generate incorrect results. Among economists at least, many have now come to disagree on that point. They argue that so called linear probability models [which is using a linear estimator to estimate a binary DV] are as accurate as logistic regression if you use robust standard errors. I have a serious problem with that based on my training, but it appears to be a common view at least among economists. To some extent it depends on the nature of the specific model you are estimating,[ if most of the estimated probabilities are in a range of .3 to .7 I think the LPM will probably be ok].

Remember that the slopes of a logistic regression is not the same thing conceptually as the slopes in linear regression. The logistic regression deals with changes in the logit not the raw level of the DV.

5. ## The Following User Says Thank You to noetsi For This Useful Post:

Alex C (02-07-2017)

6. ## Re: comparing linear vs logistic regression for a simple model with binary variables

And yeah note that it really doesn't matter in your case. If both are binary then either way you parameterize your model you're basically going to be saying "let's fit a value when x=0 and another value when x=1".

7. ## The Following User Says Thank You to Dason For This Useful Post:

CowboyBear (02-06-2017)

 Tweet

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts