# linear probability models: heteroscedasticity

#### Lazar

##### Phineas Packard
I have to run linear probability models rather than logistic regression models (dichotomous endogenous variable) and I am wanting to look for a way of dealing with heteroscedasticity to get correct standard errors. I wonder whether using robust weighted least squares in the R package lavaan for example would be the answer I am looking for?

#### Dason

It looks like you could just use glm in R and specify family=binomial(link = "identity")

#### Lazar

##### Phineas Packard
It looks like you could just use glm in R and specify family=binomial(link = "identity")
Cool thanks dason. Will this deal with the issues in relation to standard errors. I would have thought I would have had to pass the output to the sandwich package at least?

In any case the complication is I have a latent variable so I need to find a solution in lavaan, SEM, or Open MX.

EDIT: for glm it would seem family=quasi(link="identity", variance = "mu(1-mu)") is the answer http://stats.stackexchange.com/questions/139917/r-binomial-family-with-identity-link. Still does not help in relation to latent variables but cool to know given I often run into this.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Lazar, what is the idea behind SE issues and latent variables? Is it just trying to adjust for a certain level of unknown?

#### spunky

##### Can't make spagetti
I have to run linear probability models rather than logistic regression models (dichotomous endogenous variable) and I am wanting to look for a way of dealing with heteroscedasticity to get correct standard errors. I wonder whether using robust weighted least squares in the R package lavaan for example would be the answer I am looking for?
if you fit them this way you're doing a probit regression, not an LPM.

but then it begs the question of whether or not you could get away with doing a probit regression instead of an LPM and keep everything in SEM-land

#### Dason

I don't see why it would be a probit. The difference between a logistic regression, probit regression, and linear probability model just looks to me to be a difference in the link function. They all look to assume bernoulli response though.

#### Lazar

##### Phineas Packard
if you fit them this way you're doing a probit regression, not an LPM.

but then it begs the question of whether or not you could get away with doing a probit regression instead of an LPM and keep everything in SEM-land
Yeah it is not a probit. Probit is the sensitivity analysis. Linear probability models are typically bad ideas and you should as a rule use probit or logistic regression. However when comparing group any change in parameter estimates could be due to either ‘confounding’ or ‘rescaling’, when using probit or logistic regression. Hence the case I am using a linear probability model.

EDIT: I see what you meant but I am keeping the link function the same and only changing the estimator. I get that it is most often the case that whenever WLS is used in a SEM context the link function is also changed.

#### Lazar

##### Phineas Packard
Lazar, what is the idea behind SE issues and latent variables? Is it just trying to adjust for a certain level of unknown?
My preference would be not to use latent variables at all and then all would be well with the world. However, one of the mediator variables of interest in the particular model is 'always' estimated as a latent variable and thus reviewers will beat me around the head if I don't. Ironically, in the PISA database (where all the data comes from) they provide warm estimates for the mediator of interest and thus there is not really a good reason from a technical perspective to go latent...but play the game I must!!

#### spunky

##### Can't make spagetti
I don't see why it would be a probit. The difference between a logistic regression, probit regression, and linear probability model just looks to me to be a difference in the link function. They all look to assume bernoulli response though.
true that. sorry for not making the context clear here. it's just that within SEM-land we kinda default to the use of probit models when fitting factor analysis type thingies to binary data. you are correct in the sense that just because you used weighted least squares with binary data doesn't immediately make it a probit regression. but if you use lavaan (or OpenMX) to fit a model with binary data, it will automatically assume that you want to fit a probit regression model if you choose any family of the limited information (e.g. weighted least squares) fit functions. Mplus defaults to that as well, but with the exception that if you select maxmimum likelihood as your estimator (and not weighted least squares) then it defaults to binary logistic regression.

really it's just about software conventions that tend to rely on the model assumptions that us social sciency types tend to use.

further details here in case you're interested:

#### spunky

##### Can't make spagetti
EDIT: I see what you meant but I am keeping the link function the same and only changing the estimator. I get that it is most often the case that whenever WLS is used in a SEM context the link function is also changed.
that was kinda my point, particularly when you brought up lavaan. i had read before on their google group that Yves says is you use the weighted least squares estimator for binary data, it immediately assumes it's a probit regression and fits it as such.

#### Lazar

##### Phineas Packard
that was kinda my point, particularly when you brought up lavaan. i had read before on their google group that Yves says is you use the weighted least squares estimator for binary data, it immediately assumes it's a probit regression and fits it as such.
He might say that but it doesn't . The binary variable has to be declared as.ordered or it is treated as continuous. What is more lavaan.surrvey has options for various WLS versions will only work with continuous data