How calculate intercept with multiple regression

joshuaAU

New Member
I am familiar with calculating the intercept, etc for a regression equation,
but I am confused as to how to calculate the intercept when using multiple
regression.

eg, in regression with one independant variable the formula is:

(y) = a + bx

where a, the intercept, = (ΣY - b(ΣX)) / N.

with multiple regression, the formula is:

Y=a + b1X1 + b2X2 + b3X3, etc

but I cannot find any equation for calculating the intercept in this case.

I know that a, the intercept, is meant to equal Y if all three independent
variables were equal to zero,
but that doesn't really help me calculate a.

I don't actually care what a is, as to me its pretty irrelevant outside
actually calculating the multiple regression.

It cannot be the same equation for a, as that only references X1, and not the
other independent variables...

That's pretty much it, but in case you need further info as to what I am
trying to do....

I am trying to calculate a predicted Y from a given number of independent
variables.
I am aware of using a reasonable number for N... the below is just a

eg
X1 X2 X3
Height | Age | Activity
170 21 3
172 23 4
169 21 2
174 25 3
172 24 4

to predict Y, weight.

spunky

King of all Drama
how much do you know or how comfortable do you feel around matrix algebra? it's not complicated to do so, but the answer may end up sounding like gibberish if you're not familiar with matrix inversions, transposes, matrix multiplication, vectors, etc...

joshuaAU

New Member

Well, I know nothing about, and are therfore horrified of, matrix algebra!
That said, I didn't know anything about regression either under I worked with it for a while.
So, at the risk of it going over my head... I'd love if you could provide some info, and I'll try to decypher your potentially "gibberish" reply. I always like a challenge.

Thanks spunky.
josh

antonitsin

New Member
for practical purpose:

use command
yfit <- lm(y~x1+x2,data=yourdatafile)

Dragan

Super Moderator
I'm not sure if this helps at this point but the intercept can be computed in general as:

Intercept = Ybar - b1*XBar1 - b2*XBar2 - b3*XBar3 - ....

joshuaAU

New Member
I originally attempted to submit this post in reply about 20 hours ago, and then again about 3 hours ago, and it never appeared.
I removed the hyperlink to another page, and it worked, so I have left out the hyperlink...

Thank you Antonitsin, however I am trying to work out the stats to modify my visual basic program. It needs to run independently of any other program. Thank you anyway though.

Thanks Dragan for your help too.
Sorry for my ignorance, but is YBar and XBar1 simply the mean of Y and the mean of X1? I assume so, but thought I'd check.
That looks quite straightforward... is that the case for calculating b1, b2, b3, etc?

I read through that link you posted and the page on the bottom re simple linear regression, but find it extremely confusing.

The way I have been doing linear regression is via the following formula...

Y' = YMean + r(Sy/Sx)(X-Xmean)

where r = pearson correlation coefficient,
Sy and Sx are the standard deviations of Y and X.

or using the raw score equivilant:

Y' =YMean + (NΣXY-(ΣX)(ΣY)/NΣX²-(ΣX)²)(X-XMean)

(sorry for the messiness of that btw)

This is from an old book I've got - "Fundamentals of Behavioral Statistics, sixth edition".
These equations I am very familiar and comfortable with.

made some sense, except for the ' character... which I then read "denotes the transpose, so that xi′β is the inner product between vectors xi and β."

Hmm, my brain's hurting at this stage...especially as its after midnight here...
I was hoping it would be more straight forward to understand, but I suspect I'm off to learn, as Spunky inferred, matrix algebra...

I did find one page that dealt with 2 dependent variables.
As stated elsewhere, I think in your link, Dason, the equation is UGLY.

The page gives the following information:

To find b1, the equation is:

to find b2, the equation is:

and to find a, the equation is:

can I extend these equations to 3 or more?
If so, is it difficult?

a would be easy, i assume, ie for 3 dependent variables, should be:

a = YMean - b1XMean1 - b2Xmean2 - b3XMean3

b1, b2, b3 etc looks more complicated but I'd guess something along the lines of:

ΣX)²
b1 = ((Σx3²)(Σx2²)(Σx1y)-(Σx1x2x3)(Σx2y)(Σx3y)) / ((Σx1²)(Σx2²)(Σx3²)-(Σx1x2x3)²)

That is however, just a guess looking at it, late at night, and could be completely wrong...

Would it be easier to to learn matrix algebra than trying to extend these equations, given I may need to use up to 5 dependent variables?

Josh

Last edited:

Dason

Would it be easier to to learn matrix algebra than trying to extend these equations, given I may need to use up to 5 dependent variables?
Yes. Dear God yes.

But at the same time why do you need to do this? Are you telling me you actually have to perform this by hand? All stats packages will fit multiple regressions for you (at least any respectable package will).

joshuaAU

New Member
Yes. Dear God yes.
Ha ha ha!

Unfortunately, I am designing an estimation program using visual basic with a SQL database. The program needs to run independently of any other program, so I need to code all of the data within the program itself. I currently have the program working well calculating simple regression, but I need to expand it to multiple regression.

Given I already have most of the info stored in variables, no, I dont, thank god, need to do this by hand...so I really only need to add a few variables, and or modify some others, and then calculate the equation from that information....

am I correct in assuming that what I was assuming earlier is correct?.. Ie:

For 3 dependent variables,

a = YMean - b1XMean1 - b2Xmean2 - b3XMean3

b1 = ((Σx3²)(Σx2²)(Σx1y)-(Σx1x2x3)(Σx2y)(Σx3y)) / ((Σx1²)(Σx2²)(Σx3²)-(Σx1x2x3)²)

If so, despite the unwieldiness, for want of a better term, of the equations, especially when extended to 5 independent variables, it is something I can code relatively easily.

However, I suspect my formula b1 above, which I modified from the page I was looking at that was for 2 variables, may not be correct. Do you think b1 above is correct?

Again, should I just dive in and get my head around matrix algebra?

Thanks for any advice you can give.
Josh

Dason

I think so. It becomes a lot easier to code if you go the matrix algebra route as well. And you have the added advantage that if you using the matrix algebra it doesn't matter how many predictors you use - your solution will work.

joshuaAU

New Member
Thanks Dason. I appreciate your help.

OK, I'm off to look into matrix algebra...

Thanks again, and thanks to everyone else for their input as well.

Josh

joshAU

New Member
OMG, I just drafted a long txt to post here... and then I had to login again, and it disappeared... even with using the back button... :-(

Oh well, it probably encourages some clarity.

OK, take 2...

I've spend hours each night since my last post 5 days ago looking up matrix algebra, and, while I now know how to add, minus, multiply, etc using matrices, I cannot for the life of me understand how to do multiple regression with matrices...

The (to me) most relevant page I could find was:
luna.cas.usf.edu/~mbrannic/files/regression/Reg2IV.html

which I won't turn into a hyperlink as my post never seems to appear.

In it, under the section "A numerical example", it gives some data for a Y and two x's.

It then gives a few formulas:

which confuse me.
Is there a difference between x and X? to me, Σx1y should equal Σx1y, NOT:

Σx1y = ΣX1Y-((ΣX1)(ΣY)/N), to my understanding...

what am I missing there?

I went through and reproduced the formulas in excel as a test and successfully got the same b1, b2 and intercept for two variables, so, despite my confusion re upper and lower case X's, I worked that out.

I then tried to convert this to 3 variables, by extending the matrix, but without much success.
ie...

the original matrices had the following layout:

: y : x1 : x2
y : Σy² : Σx1y : Σx2y
x1 : ryx1 : Σx1y² : Σx1x2
x2 : ryx2 : rx1x2 : Σx2²

so, I'm guessing that to extend it to three variables, I should change it to:

__ : y : x1 : x2 : x3
y : Σy² : Σx1y : Σx2y : Σx3y
x1 : ryx1 : Σx1y² : Σx1x2 : Σx1x3
x2 : ryx2 : rx1x2 : Σx2² : Σx2x3
x3 : ryx3 : rx1x3 : rx1x2 : Σx3²

But, I guess, I'm confused as to how to process that information, ie, how to convert it into one of the formulas, as it is used for two variables...

I apologize if this isn't very coherent, my last post, which vanished, was slightly more coherent, but, as you have probably guessed, I'm somewhat confused...

If anyone has any links to a web page that has worked examples of multiple regression using matrices with 3+ variables, or could put me straight some other way, or has any other ideas or suggestions, I'd greatly appreciate it.

Thank you.

Last edited:

Dragan

Super Moderator
Is there a difference between x and X?
what am I missing there?
Josh: The smaller x notation represents deviation formula e.g. x = X - Xbar.

joshAU

New Member
Ah, thank you Dragan, that helps and makes sense, much appreciated.
I was going to reply with (yet) another question... but I'll hold off as I think I'm beginning to make some sense of it...

Thank you again Dragan, and everyone else.
Josh

joshAU

New Member
OK, sorry people, but a follow up...
I've spent the last 2 weeks looking into multiple regression using matrix formulas.
And, surprise, surprise, I have (yet) another question.
Sorry!
I looked through one page that dealt with matrix regression using two x variables, and have been trying to expand it to three variables, without much luck...

I had assumed that the matrix should look as follows:

b =
Σx1²......Σx1x2.....Σx1x3.....Σx1y
Σx2x1....Σx2².......Σx2x3.....Σx2y
Σx3x1....Σx3x2.....Σx3².......Σx3y
X’X..................................X’y

However, I am having trouble working out how to subtract the off-diagonal elements to get the determinant for X'X

Given that for 2 x variables it would be:

(Σx1²)(Σx1²)- (Σx1x2) ²

I assume the start of that equation would become:

(Σx1²)(Σx2²)(Σx3²)

But I don’t understand how to subtract the off-diagonal elements, namely:

(Σx1x2) ², (Σx1x3) ² and (Σx2x3) ²

If anyone has any information on how to calculate this, I'd appreciate any help you can give.

"The determinant of larger matrices is very tedious to compute; it is not a simple extension of the 2 variable case." and that " beyond about 3 variables, it becomes nearly impossible without a matrix program".

Given that the programs in question, including excel, can compute these figures, for n variables, surely it is possible to calculate these figures, without resort to a program like SAS, R, Excel... or the like?

I have been trying to create, using SQL and Visual basic, my own program, as a hobby, over the last many years, mainly as a learning experience, and I'd rather not have to buy/and or incorporate, access, to such a program to do such a calculation.

I realise I may be biting off more than I can chew, but I'm getting used to doing that.

Thank you.
Josh

Martja1

New Member
Well I know this is super late, but I am dealing with the EXACT same problem here, Josh!
I was wondering if you would be so kind as to dig out your old program and see how exactly you did it, and tell us all

If you're wondering, I too am coding a multiple-regression package and have simple linear regression working, but I don't understand how to calculate the intercept when there are multiple variables involved.
Thanks a lot,
James