Distribution of coefficients in regression

jpkelley

TS Contributor
#2
Hi Sak, you're asking about generating a full histogram of the distribution of the parameter estimate. I've been dealing with this with respect to the difference between confidence and prediction intervals.

I am interested in this issue as well. What I understand is that if you're only interested in the distribution of the mean and mean error, then you simply assume that the SE is the standard deviation and generate the distribution accordingly. For 95% prediction intervals, it's a bit more of a pain (as I understand it) since you need to add the error variance to the unexplained variance in the model. Ugh.

Any comments on this from other contributors? Given a fake data set, how would you....

(1) Generate a histogram of the parameter estimates given the mean and standard error provided in the model output?
(2) Generate a histogram of the underlying "prediction" distribution at a given x, given the model output?

To get a conversation started, here's a fake data set:
Code:
set.seed(100)
fake.data <- data.frame(x=seq(1:20), y=rnorm(20, 10,2)*seq(1:20))
model<-lm(y~x, data=fake.data)
Does this make sense? My first thought for a histogram of the mean parameter estimate is:
Code:
set.seed(100)
hist(rnorm(10000, mean=11.0147, sd=0.9272),freq=F)
I would be interested in your thoughts...sak, I hope this directly relates to your original post.
 
Last edited:

BGM

TS Contributor
#3
I think if you have the assumption that the residual is normally distributed, then it should
be fine with the 2nd post. You can find the answer easily by searching ordinary least
square.

If you have no assumption at all, you may try bootstrapping.
 

sak

New Member
#4
@BGM can you please describe in detail this bootstrapping method. The only bootstrapping method I am aware is for getting the yield curve from bond prices and I can't figure how that would apply here.
 

spunky

Can't make spagetti
#5
Hi Sak, you're asking about generating a full histogram of the distribution of the parameter estimate.
this is me just wondering if this would work... takes a little longer than expected in my clunky, old laptop so if anyone has any problems it is possible to bring down the number of iterations from 10,000 to 1,000 (btw, how many iterations do people usually use? i tend to like 10,000 for some reason, but it depends on time computer power...)

Code:
library(mvtnorm)

reps <- 10000
Beta0.est <- double(reps)
Beta1.est <- double(reps)
Beta2.est <- double(reps)
R2.est <- double(reps)
pop.cor.mat <- matrix(c(1, 0, 0, 0, 1, 0, 0, 0, 1), nrow = 3, byrow = T)

for (i in 1:reps){
	samp <- data.frame(rmvnorm(100, c(0,0,0), pop.cor.mat))	
	samp.lm <- lm(X1 ~ X2 + X3, data = samp)
	Beta0.est[i] <- samp.lm$coefficients[1]
	Beta1.est[i] <- samp.lm$coefficients[2]
	Beta2.est[i] <- samp.lm$coefficients[3]
	R2.est[i] <- summary(samp.lm)$r.squared
}
Reg.Results <- data.frame(Beta0.est,Beta1.est,Beta2.est,R2.est)
hist(Beta0.est, breaks = 100, col = "grey", las = 1, main = "Regression Simulation", xlab = "Estimated Regression Coefficient Beta0", ylab = "Frequency", cex.lab = 0.7, cex.axis = 0.7, cex.main = 0.8)