- Thread starter noetsi
- Start date

Say some one has a master's degree. They look at graduate degrees (one dummy), BA (a second), AA degrees (a third), some college no degree (a 4th), and high school (a fifth I am making this easier than their system).

Say you got a MA, a BA, earned a AA and graduated from high school. It would code you one in every one of these dummies.

I changed my code so if you get a MA it won't show you in a BA and below (and a BA it won't show below that level and so on). So it essentially analyzes the value of the highest degree I think. But I am not sure what analyzing their model is showing.

https://opa.hhs.gov/sites/default/files/2020-07/lpm-tabrief.pdf

Is it true that the criticisms of linear probability models don't apply with binary predictors? All the predictors I care about in my present model are dummy variables. I don't think non-linearity actually exists with dummy variables and that is one of the major critiques of linear probability models (heteroskedasticity is still an issue, but not a concern to me since I have the population. I don't care about p values at all). Some predictors in the model are interval, but I am not analyzing them yet.

Paul Allison's take on this in part

• Heteroscedasticity is easily fixed with robust standard errors. •

Non-normality is a trivial problem with moderate to large size samples. •

The most intractable problem has been non-linearity, manifest by predicted probabilities greater than one or less than zero

This third problem I think is essentially about non-linearity which I don't think applies to dummy variables.

A concern because nearly all my effect sizes for the LPM are below .2 which is often where concerns are raised about them.

• Heteroscedasticity is easily fixed with robust standard errors. •

Non-normality is a trivial problem with moderate to large size samples. •

The most intractable problem has been non-linearity, manifest by predicted probabilities greater than one or less than zero

This third problem I think is essentially about non-linearity which I don't think applies to dummy variables.

A concern because nearly all my effect sizes for the LPM are below .2 which is often where concerns are raised about them.

Last edited:

I found this argument different

"For many it may come as a surprise to find that the variable sex, with categories ‘male’ and ‘female’ is not a nominal variable. The simple reason is that it contains only two categories and this makes it formally an interval/ratio variable."

https://arxiv.org/ftp/arxiv/papers/1511/1511.05728.pdf

I always thought of dummy variables as ordinal. This matters (I assumed wrongly) because while interval predictors can be non-linear ordinal predictors are inherently linear (except they are not I now realize).

"Dummy variables meet the assumption of linearity by definition, because they create two data points, and two points define a straight line. There is no such thing as a non-linear relationship for a single variable with only two values."

https://www.researchgate.net/post/Check-linearity-between-the-dependent-and-dummy-coded-variables

So an ordinal variable with more than two levels could be non-linear....

But they make a good point that you generate a mean difference with a dummy variable and ordinal variables can not have that. Of course that assumes what you predict is interval I think (or your making the assumptions of linear probability models).

"For many it may come as a surprise to find that the variable sex, with categories ‘male’ and ‘female’ is not a nominal variable. The simple reason is that it contains only two categories and this makes it formally an interval/ratio variable."

https://arxiv.org/ftp/arxiv/papers/1511/1511.05728.pdf

I always thought of dummy variables as ordinal. This matters (I assumed wrongly) because while interval predictors can be non-linear ordinal predictors are inherently linear (except they are not I now realize).

"Dummy variables meet the assumption of linearity by definition, because they create two data points, and two points define a straight line. There is no such thing as a non-linear relationship for a single variable with only two values."

https://www.researchgate.net/post/Check-linearity-between-the-dependent-and-dummy-coded-variables

So an ordinal variable with more than two levels could be non-linear....

But they make a good point that you generate a mean difference with a dummy variable and ordinal variables can not have that. Of course that assumes what you predict is interval I think (or your making the assumptions of linear probability models).

Last edited: