# Thread: How can this be true [linear versus logistic regression]

1. ## How can this be true [linear versus logistic regression]

This comes from a government study that was used to generate something of great importance, funding and related factors for a major government program. It contradicts pretty much everything I have read in the last decade (and learned in class).

For simplicity and speed and because of the large number of models estimated, the models were estimated using linear probability models, even when the dependent variable was binary. Logit and probit estimation techniques are generally recommended for estimating equations with zero-one dependent variables. However, the authors of the methodology reported that using logit or probit made it more difficult to interpret the results and created some complexities in calculating adjustments.
Interpretation of odds ratios or slopes from logistic models are more difficult to interpret, but interpreting linear models with a binary DV are simply wrong - or so I have always read

For example, they stated that because logit and probit are non-linear models, the adjustment factor could not be calculated using sample means but rather required calculating probabilities for all observations using the full set of data.
I don't understand what this means. What they were doing was estimating slopes of variables which they then used with other data for the X to estimate requirements for agencies to meet. That is in this first part they, I think, were creating slopes then in the second part they used these slopes and current data on the IV to estimate what the goals of the agency [the DV] should be.

shown that the drawbacks of using linear probability models, compared with logit and probit techniques, were minimal.
That is news to me. I have read the exact opposite.

2. ## The Following User Says Thank You to noetsi For This Useful Post:

jbwettergreen (01-10-2017)

3. ## Re: How can this be true [linear versus logistic regression]

They go onto to say this (I think this involves estimating the slopes that are used in the second stage, although I am not certain).

In order to test the sensitivity of the estimates to this simplification, both techniques for entered employment and retention performance measures for the WIA Adult program were estimated. The coefficients estimates were found to be quite similar if not virtually identical in most cases.
So why do we do logistic regression if there is no difference between it and linear regression according to the US government for binary DV

4. ## Re: How can this be true [linear versus logistic regression]

Try to think it through yourself instead of worrying about what authorities say. So to start:

When you have a binary DV, which assumptions of the linear OLS model are breached?
What properties of the OLS estimator are those assumptions required for?

5. ## Re: How can this be true [linear versus logistic regression]

hi,
from a practical POV, isn't the argument that in the middle range (probabilities relatively far from 0 or 1 ) the OLS will lerform well, the problem being that it can predict senseless values at the extremes?

regards

6. ## Re: How can this be true [linear versus logistic regression]

Originally Posted by CowboyBear
Try to think it through yourself instead of worrying about what authorities say. So to start:

When you have a binary DV, which assumptions of the linear OLS model are breached?
What properties of the OLS estimator are those assumptions required for?
I don't know all the violations, but two I remember. First the data will be always heteroscedastic. Second, nonsensical slopes can be found.

Since I don't consider myself particularly good at statistics, what experts say matters to me And more to the point, this is not just a theoretical matter. It involves the setting of goals that my agency, and most DOL and DOE organizations will have to meet - or there will be major consequences. So if the metrics was set wrong, presumably by real statisticians, that is sort of important.

7. ## Re: How can this be true [linear versus logistic regression]

They obviously trend in the same way. I would say deviating away from logistic seems sketchy to me, in that you run the risk of model misspecification. They probably made that statement so everyone would be on the same "scale" per se and to make it easy for those that are not familiar with logistic. Seems lazy and if their staff can't run both, then maybe they aren't the right people. They just need to come up with boil plate language how to interpret both for the stats illiterate people who use the results.

I bet it revolves around the difficulties of conveying results to politicians and them using the results.

8. ## Re: How can this be true [linear versus logistic regression]

The analysis is highly complex, these are clearly expert econometricians.

It appears that econometricians, some of them anyhow, have decided that since results in logit and OLS [linear probability models when predicting binary variables] often are very similar its ok to use OLS. Part of this involves when you're estimating certain range of results apparently, the more results are near extreme the less well linear probability does. But in many cases you are not estimating extreme values so that is not an issue. Second, they argue that the inherent heteroscedastcity can be eliminated with White SE [not sure that is true, but they believe it]. Finally, they argue that while linear probability models are sometimes wrong, so are logistic models [that is wrong in predicting binary variables without nonsensical results - but this may also deal with mispecification].

9. ## Re: How can this be true [linear versus logistic regression]

Maybe this is an econometrics thing.

Here's an article discussing the issues: http://statisticalhorizons.com/linear-vs-logistic

10. ## Re: How can this be true [linear versus logistic regression]

noetsi,

Do you have a link to the source of what you are referencing so we can better put it into context?

11. ## Re: How can this be true [linear versus logistic regression]

It is a pdf sent me for which I have no link. This is the pertinent comment by the authors.

For simplicity and speed and because of the large number of models estimated, the models were estimated using linear probability models, even when the dependent variable was binary10. Logit and
probit estimation techniques are generally recommended for estimating equations with zero-one dependent variables. However, the authors of the methodology reported that using logit or probit
made it more difficult to interpret the results and created some complexities in calculating adjustments. For example, they stated that because logit and probit are non-linear models, the adjustment factor
could not be calculated using sample means but rather required calculating probabilities for all observations using the full set of data. Further, the argument was made that econometricians had
shown that the drawbacks of using linear probability models, compared with logit and probit techniques, were minimal. In order to test the sensitivity of the estimates to this simplification, both
techniques for entered employment and retention performance measures for the WIA Adult program were estimated. The coefficients estimates were found to be quite similar if not virtually identical in
most cases.
I do have a link to the econometric book that establishes to the authors linear probability models are satisfactory equivalents to logistic regression.

https://pdfs.semanticscholar.org/6bd...e5a0763289.pdf

If you can use linear probability models for binary variables, why ever run logistic regression? Slopes in logistic regression are very difficult to interpret, you get no true R square, and many test that exist for linear models do not exist with logistic regression [including diagnostics].

12. ## Re: How can this be true [linear versus logistic regression]

Don't forget that binary outcomes can also be put on the risk scale and used for relative risks and risk differences. These allow you to calculate relative risk reduction, absolute risk reduction, number needed to harm, and number need to treat (e.g., how many people do you have to intervene on to get another outcome of interest compare to the other group).

13. ## Re: How can this be true [linear versus logistic regression]

The problem with that is that I have not found, and I tried really hard to do so several years ago, to calculate relative risk in SAS. Do you know a way to generate relative risk in SAS?

14. ## Re: How can this be true [linear versus logistic regression]

Yes, if i remember I will send links tomorrow. It likely uses the GLM procedure.

16. ## The Following User Says Thank You to hlsmith For This Useful Post:

noetsi (01-10-2017)

17. ## Re: How can this be true [linear versus logistic regression]

Originally Posted by noetsi
I don't know all the violations, but two I remember. First the data will be always heteroscedastic. Second, nonsensical slopes can be found.
What are the consequences of violation of the assumption of homoscedasticity? What other assumptions are there? Is it true that odds ratios are always harder to interpret than linear slopes? How might the usefulness of logistic vs linear regression differ depending on whether the goal is explanation or prediction?

Since I don't consider myself particularly good at statistics, what experts say matters to me
Basically I'm trying to get you to think things through critically yourself - you're perfectly capable of this Simply asking what the experts conclude works only when they're all in agreement (i.e., never!) But we can critically evaluate the arguments being put forward by experts and think about when they are and aren't valid. That critical authority-questioning attitude is an essential part of a scientific mindset (regardless of where you're trying to do science).