Linear Probability Model


No cake for spunky
This is when you use linear assumptions for a DV that has two levels only. SAS's senior statistician told me firmly this only works when you assume a binomial distribution. However, the organization I report to chose to assume normality. As I understand it, this is not then a Linear Probability Model and I do not know how to interpret the slopes.

This is what SAS's senior statistician said.

As noted in replies to your post in the Statistical Procedures Community, the model with DIST=BIN and LINK=IDENTITY is considered a linear probability model. The model without DIST=BIN assumes a normal response and the normal log likelihood is then maximized. The mean of the binomial is a probability, while the mean of a normal is not, so I don't see how the model without DIST=BIN could be called a linear probability model.
PROC GENMOD models the probability of the lowest response level by default. You should explicitly specify the level you want to model by specifying that level in the EVENT= option following the response variable. If you have a binary predictor variable with values 0 or 1, it is best to not specify it in the CLASS statement since, by default, that will cause the estimated parameter for that variable to correspond to the lowest level, 0, rather than 1. If you do specify it in the CLASS statement, use the REF= option following the variable to specify that level 0 is the reference level. For example,
proc genmod;
class x(ref="0");
model response(event="1")=x / dist=binomial;
David Schlotzhauer
Senior Statistician
SAS Institute Inc.
phone: 919.677.8008
toll free: 800.727.0025


Less is more. Stay pure. Stay poor.
But is this person an economist, which treat OLS with binary DV as a LPM? The simple ML model would kick out the same result as simple OLS, right. And if OLS process is deemed an LPM by econ folks aren't there generally the same thing. So what do the coeffs mean, well if the the DV in OLS is binary, it is reporting the change in DV from a change in IV. So if the y-hat is mean of 0s and 1s that would be the prevalence of 1s in the set.

I get what this person is saying, since in my field uses that GLM process to get risk differences. Which is the difference between two rates crudely speaking.


No cake for spunky
I don't know if the person above is an economist or not only that they work for SAS.

If I understand you correctly the slope in a simple linear model with a binary DV would be the percent increased chance of a 1. What is confusing is SAS told me that the log would say whether you were maximizing 1 or 0 and I can not find it in either. I have to try again.