# Misspecified linear model, interpretation of slope

#### fed2

##### Active Member
Suppose you fit a simple linear regression when the data actual follow some sort of sigmoid, 4 parm logisitc or similar.

To what extent is the following statement true/false:
"The estimated slope of the simple linear regression is the average slope of the sigmoid"

Seems intuitively true doesn't it?

#### hlsmith

##### Less is more. Stay pure. Stay poor.
My brain can't even process it. It feels latent and cloaked issues, but maybe not. I picture the sigmoid shape and a single line going through it. So what does "average slope of sigmoid" even mean, if you weighted piecewise segments and created a single number? Will if linear regression can be used to fit binary data, I suppose it could be true, maybe.

#### fed2

##### Active Member
I guess the question is pretty vague. I got to thinking about it though and I realize that if you have a nonlinear function g(x) and consider the expectation of the usual linear regression slope betahat = Sxy/Sxx and you replace Y with its expected value g(x), and approximate g(x) with first order taylor, then you end up with something like
E(betahat) = slope of g at mean X + bias term

where the bias term is the slope of the linear regression relating the Remainder term in the taylor approx to X.

I swear i've seen this sort of theory before somewhere, but it is hard to google up again.

Maybe some R code explains better

C-like:
#logistic fun
g = function(x){
1/( 1 + exp(-x) )
}
#first derivative
g_dot = function(x){
g(x)*(1 - g(x) )
}

#first order approximation
tangent = function(x,a){
g(a) + g_dot(a)*(x - a)
}

#fit slope to logistic experiment;
runSim <- function(j){
X <- runif(10,-1,1)
gX = g(X) + rnorm(10,0,.1)
betaHat = coef( lm(gX ~ X) )[[2]];  #Fit linear regression to logistic;
#the bias is slope of SLR relating errs to X;
errs =  tangent(X,0) - gX;
bias = coef( lm(errs ~ X) )[[2]];
data.frame( betaHat=betaHat, bias = bias)
}

mySims =  do.call( 'rbind', lapply(1:1000, runSim)  )

print(  sprintf( 'expected slope at mean X = %f', g_dot(0) )  )

print(   sprintf( 'slope of slr %f', mean(mySims$betaHat ) ) ) print( sprintf( 'predicted bias %f', mean(mySims$bias  )  )  )

#so fitting linear reg gives g_dot(0) + a bias related
#to the erro of first order taylor approx?

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Yes, i also wondered how this may play out in a sim. I look forward to reviewing your code when iget back to the office on Monday.