I know that in regression (well any linear model I would assume; correct this if I'm wrong) models there's a constant of 1 added to the model. I've done a regression with matrix algebra and you have to put a vector of one's in the matrix for the predictor.
Why? What is it doing. I also know that if you do -1 in R it removes this constant. That means there's no intercept. Why would you want to have no intercept in the model?
I don't have theory so I need the explicit version. I don't really understand mathematical notation as an explanation for the most part.
"If you torture the data long enough it will eventually confess."
-Ronald Harry Coase -
I think one of the relatively sound reason is that you want the line/plane to pass through the origin, due to their physical nature in the real world (or other plausible constraints). The general advise would be cautious to this kind of modelling (because it is a sub-model of the general one) and avoid that unless there is a good reason to do so.
I wrote a statisticspedia article about this but it seems that SP is dead. My argument is that even if we have reasons to believe that the outcome should predict a value of 0 when all of our predictors are 0 that unless we're 100% sure that our model is correct then we're most likely hurting ourself if we omit the intercept.
Really most of the time we're trying to get a local approximation to the truth when we do regression. A linear model could provide a good local approximation even if we think the truth is slightly more complicated. We shouldn't extrapolate too far outside of the range of our covariates. So if 0 is outside of that range then why are we adding info about 0 into our model? If 0 is inside the range then why not just let the data speak for itself?
Also if you don't include an intercept you're really allowing the possibility that the model you come up with ends up being worse than just predicting the mean of Y for any input. If we're going to go through the process of building a model and end up doing worse than just saying "predict mean(Y) no matter what the covariates are" then I say we didn't do a good job of building a model.
You can always play around with this stuff too.
Code:
x <- rep(c(102:106), 10)
# Theoretically when x = 0 then y = 0
y <- -x*(x - 200) + rnorm(length(x), 0, 10)
plot(x, y)
o.without <- lm(y ~ x - 1)
# Oh look x is highly significant.
summary(o.without)
plot(x, resid(o.without))
o.with <- lm(y ~ x)
summary(o.with)
plot(x, resid(o.with))
o.mean <- lm(y ~ 1)
summary(o.mean)
plot(x, resid(o.mean))
summary(o.with)$sigma
summary(o.mean)$sigma
summary(o.without)$sigma
plot(x,y)
abline(o.without, col = "red")
abline(o.with, col = "purple")
abline(o.mean, col = "black")
Just look at the many ways the no intercept model is horrible even in this situation where when x=0 that y should be 0.
"His programming is malfunctioning. It begins! Get your weapons, he's going to become a killbot!!!" - bryangoodrich
Nice demo that makes sense. I'm getting the feeling I shouldn't ever do this with the stuff I do. Maybe a statistician would but I can't think of a time when this would be useful. Further, explaining the results, even if the model were good, would seem difficult at best.
"If you torture the data long enough it will eventually confess."
-Ronald Harry Coase -
What is the 1 actually doing algebraically. Where is it in Y = mX + b? I know the outcome it makes the line pass through the intercept (in simple regression this is the mean score) but what is it doing in that good old 9th grade linear equation?
"If you torture the data long enough it will eventually confess."
-Ronald Harry Coase -
Well not the beta weight but it only returns one value. Here's a script in case anyone has never done simple regression with matric algebra and wants to give it a try:
Code:
##############################################
# FORMULA FOR REGRESSION PARAMETERS #
##############################################
# b = (X'X)^-1 (X'y) #
##############################################
#DATA
midterm <- c(5,7,7,7,9)
final <- c(4,5,6,8,10)
(SUM <- summary(lm(final~midterm)))
#==============================================
#ASSIGN DATA TO LETTERS TO FIT MATRIX NOTATION
x <- midterm
y <- final
#==============================================
#CONVERT VECTOR x TO MATRIX X WITH PARAMETER
X <- as.matrix(c(rep(1,length(x)),x))
dim(X)<-c(5,2)
X
#==============================================
#DOING THE (X'X) PORTION
M <- t(X) %*% X
M2 <- crossprod(X) #Fast way to do same as M
#==============================================
#DOING THE (M)^-1 PORTION (THE INVERSE)
Min <- solve(M)
#==============================================
#DOING THE (X'y)
Xp <- t(X) %*% y
Xp2 <- crossprod(X,y)
#==============================================
#DOING THE MATRIX MULTIPLICATION
(b <- Min %*% Xp)
#The upper is the intercept and the lower is the slope
#..............................................
SUM #compare to b
#======================================
# WHAT IT ALL BOILS DOWN TO
#======================================
#FASTEST WAY WITH MATRIX MULTIPLICATION
(b <- solve(crossprod(X))%*%crossprod(X,y))
#========================================
#PREDICTING
#========================================
#HOW TO PREDICT FOR A MIDTERM SCORE OF 7
xi <- c(1,7) #THE ONE IS THE PARAMETER
xi %*% b
"If you torture the data long enough it will eventually confess."
-Ronald Harry Coase -