# Multi linear regression

#### JohnLock

##### New Member
Hello everyone, I am new here (and to stats in general).

Anyway I am working on a project where I want to predict the value of an outcome and I have a few variables known beforehand that might be related. This is how I planned to go about finding a formula that will more or less predict that outcome, is it correct?

List potentially influential variables
Gather samples (I can do as many as I like)
Calculate the Pearsons correlation between each variable and the outcome
Linear regress the first variable, then regress on the residual from that variable etc
Match the full regression against a regression excuding each variable to test importance

Is this how I should go about doing it? How do I know that a multi linear regression is the way to go? And is there any free software out there that would make my life easier?

#### trinker

##### ggplot2orBust
JohnLock said:
Is this how I should go about doing it? How do I know that a multi linear regression is the way to go? And is there any free software out there that would make my life easier?
You seem to be describing hierarchical multiple regression and are looking at changes in R squared eliminating previous variables. This is an approach I often use but as far as answering if this is the way to go... You haven't really provided any information about your data or research questions which are what dictates choices about a test to be used. The best way to know if you're making a sound choice about the test you use is to become educated on the purpose, strengths and weaknesses of each test. This is a
researcher decision that must be justified.

Now onto the free software that will make life easier. One letter R. We have a thread for R resources HERE. Initially the program/language takes some time but once you get it you'll wonder what you did before it.

#### JohnLock

##### New Member
So when I studied my bachelor in economics I didn't pay attention to statistics, the way it was explained, it seemed more like an academic exercise than a useful tool.
Now wiser, and soon to begin my master, I am beginning to really regret not knowing statistics and since I only really feel i have learned something is when I have done it, I have made it a personal project to model European style options. I expect there isn't any very reliable formula that can do it, but the exercise is worth the try.

Anyway you can of course sell your option to others, and the price of your option is determined by the chance of being profitable at expiration date. It is these fluctuations I would like to try to model. I just realized that I might need another model since the price will move more and more towards one of two extremes, either 0 when the option looks to be worthless or towards the share price - minus the exercise price when profit looks to be made.

Did it make any sense?

#### JohnLock

##### New Member
Is there no one who can point me in the right direction as to what I should do?
My thinking right now is to continue with the linear regression approach, but having one of the variables be time left.

What do you think?

#### Dragan

##### Super Moderator
Hello everyone, I am new here (and to stats in general).
It's obvious that you're new to stats ---based on your (full) original post. I would suggest that you do further study in the area of multiple regression before asking questions.

#### JohnLock

##### New Member
You are right, I have never really done anything stats or regression related before.
I have read up on multi linear regression as best I could and I am pretty sure I would be able to do it.
I just needed to know if my it indeed was the way to go in my particular situation.
Anyway I will try it out the way I initially wanted to, and see how it turns out.
Maybe I will have another question along the way

#### Dason

I'm not a huge fan of your original plan.

If your ultimate interest is in real scientific progress, I'd suggest that you ignore that sentence (and any conclusion drawn subsequent to it).
-- Andy Liaw (in response to a question on the meaning of the sentence: 'Independent variables whose correlation with the response variable was not
significant at 5% level were removed')
R-help (March 2010)
That quote summarizes my thoughts on the reason I'm not a fan of the original plan.

Note that it's quite possible for variables that are "obviously not related to Y" to end up to be important predictors in the presence of other variables.

#### noetsi

##### No cake for spunky
Especially if multicollinarity or interaction is occuring.