Variation in inflection point for modeled logistic curve


TS Contributor
Hey all.

Question category: R and regression; Posted in R forum since R code.
Question: How best to quantify variation in an inflection point for a logistic curve?
Background of problem: Birdies begin their birdie lives contained in eggs. Then, they hatch and continue their lives. Bothersome scientists are interested in determining how many days the birdies spend as eggs; that is, how many days until they hatch. We check nests each 2-4 days and record 1=eggs present, 2=hatching. Thus, because we often don't catch the birdies as they're emerging from their shells, there is some uncertainty in observed hatch date.
Current modeling approach:(in line with the way I already have my data for other analyses) Binomial GLM: x=nest age (in days), response: 1=eggs, 0=nestlings. Model the heck out of it.

Generate example and plot:
# generate data
y<-c(rep(1,40), mid.vals, rep(0,40))
mod<-glm(y~x, family=binomial)<-predict(mod,$fit<-predict(mod,$
# Example plot
plot(x,pred.val, type="l",lwd=2, col="white",xlab="",ylab="",ylim=c(0,1))
plot(x,pred.val, type="l",lwd=2,xlab="",ylab="",ylim=c(0,1))
plot(x,pred.upper, type="l",lwd=2,lty=2,xlab="",ylab="",ylim=c(0,1))
plot(x,pred.lower, type="l",lwd=2,lty=2,xlab="nest age (days)",ylab="Pr(unhatched egg)",ylim=c(0,1))
abline(h=0.5, col="red",lwd=1) # horizontal line at y=0.5
abline(v=49.5, col="red",lwd=1) # general line to see x-point of inflection
Estimation: From model equation, use beta and SE.beta to back-calculate x at y=0.5 (point of inflection): logit(0.5)=intercept+beta*x. Draw n=10000 from normal distribution on link scale.
inter.draws<-rnorm(10000, coefficients(mod)[1],sqrt(diag(vcov(mod)))[1])
slope.draws<-rnorm(10000, coefficients(mod)[2],sqrt(diag(vcov(mod)))[2])<-inter.draws/(-1*slope.draws) #to estimate mean x from y=0.5, or logit(0.5)
I've tried other approaches, but I think this may do a good job of both giving me a good estimate of hatch day while also allowing for observer uncertainty.

Thoughts? Anything I'm not considering or approached with less-then-impressive rigor? (I always worry about drawing from the model output distributions, but it always seem to come out close in tests.)


Cookie Scientist
I think the general approach is fine, however your current implementation gives the wrong answer because it ignores the covariance between the intercept and slope. This is particularly a problem here because the covariance is in fact quite pronounced, which we can see by converting the covariance matrix to a correlation matrix:
> cov2cor(vcov(mod))
            (Intercept)          x
(Intercept)   1.0000000 -0.9850965
x            -0.9850965  1.0000000
We can implement the approach correctly using mvrnorm() from the MASS package, which gives a rather different answer, one which to my eyes appears far more consistent with what we would have expected based on a visual inspection of the fitted regression line (i.e., the range of x-values whose predicted values include the inflection point within some reasonable confidence interval):
draws <- mvrnorm(n=10000, mu=coef(mod), Sigma=vcov(mod)) <- draws[,1]/(-1*draws[,2]) 
It should also be possible, and perhaps preferable, to solve for the desired range of x-values analytically. There are techniques I am aware of, but am not too familiar with, for probing interactions by getting the range of x-values demarcating when the simple effects of the predictors are and are not significant. A relevant paper is:
It should be possible to use this same basic logic to find the range of x-values at which the inflection point is within some specified confidence interval in your GLM. I started to work out how it would be done but then decided I was wasting too much time on it, so I will just point out the idea and leave the details to you in case you are interested :p. If you do end up working this out, please post your method here...


TS Contributor
Thanks, Jake. Very good response. I hadn't considered the covariance issue. Thanks for taking the time to delve into this and for spotting the problem. This should be simple to implement. I'll post my progress to this and your suggested second approach ASAP.

I decided to approach this question for a number of reasons. Among those, I'm imagining this might help many biologists (in conservation, etc.) be able to understand timing of breeding cycles in species for which there's often imperfect monitoring.