Misspecified linear model, interpretation of slope


Active Member
Suppose you fit a simple linear regression when the data actual follow some sort of sigmoid, 4 parm logisitc or similar.

To what extent is the following statement true/false:
"The estimated slope of the simple linear regression is the average slope of the sigmoid"

Seems intuitively true doesn't it?


Less is more. Stay pure. Stay poor.
My brain can't even process it. It feels latent and cloaked issues, but maybe not. I picture the sigmoid shape and a single line going through it. So what does "average slope of sigmoid" even mean, if you weighted piecewise segments and created a single number? Will if linear regression can be used to fit binary data, I suppose it could be true, maybe.


Active Member
I guess the question is pretty vague. I got to thinking about it though and I realize that if you have a nonlinear function g(x) and consider the expectation of the usual linear regression slope betahat = Sxy/Sxx and you replace Y with its expected value g(x), and approximate g(x) with first order taylor, then you end up with something like
E(betahat) = slope of g at mean X + bias term

where the bias term is the slope of the linear regression relating the Remainder term in the taylor approx to X.

I swear i've seen this sort of theory before somewhere, but it is hard to google up again.

Maybe some R code explains better

#logistic fun
g = function(x){
  1/( 1 + exp(-x) )
#first derivative
g_dot = function(x){
  g(x)*(1 - g(x) )

#first order approximation
tangent = function(x,a){
    g(a) + g_dot(a)*(x - a)

#fit slope to logistic experiment;
runSim <- function(j){
    X <- runif(10,-1,1)
    gX = g(X) + rnorm(10,0,.1)
    betaHat = coef( lm(gX ~ X) )[[2]];  #Fit linear regression to logistic;
    #the bias is slope of SLR relating errs to X;
    errs =  tangent(X,0) - gX; 
    bias = coef( lm(errs ~ X) )[[2]];
    data.frame( betaHat=betaHat, bias = bias)

mySims =  do.call( 'rbind', lapply(1:1000, runSim)  )

print(  sprintf( 'expected slope at mean X = %f', g_dot(0) )  )

print(   sprintf( 'slope of slr %f', mean(mySims$betaHat  )  )  )

print(   sprintf( 'predicted bias %f', mean(mySims$bias  )  )  )

#so fitting linear reg gives g_dot(0) + a bias related
#to the erro of first order taylor approx?


Less is more. Stay pure. Stay poor.
Yes, i also wondered how this may play out in a sim. I look forward to reviewing your code when iget back to the office on Monday.