Multiple Regression

#1
When we fit a multiple linear regression, how i can know which explanatory variables i should include and which ones i should exclude from the model, how i know the ones that could affect the model negatively?
Thanks in advance
 
#2
It is based on your subject matter knowledge. If you don't know a lot, or very little, about your subject your model might be "strange" (or crazy). But we must all start somewhere.
 
#3
Research studies usually have explanatory variables that can be numerous for different reasons (mediating, exploratory, theoretical, etc.) and it is not easy to decide which variables to include and which to exclude from a final model.
In order to select important variables, it is necessary to include theoretically relevant variables to address hypotheses. It is also highly important to add enough variables for good predictive power. On the contrary, the model must be kept simple. We must find the balance between these two points. In case we add too many, we run the risk of causing a situation of multicollinearity.
 
#4
Research studies usually have explanatory variables that can be numerous for different reasons (mediating, exploratory, theoretical, etc.) and it is not easy to decide which variables to include and which to exclude from a final model.
In order to select important variables, it is necessary to include theoretically relevant variables to address hypotheses. It is also highly important to add enough variables for good predictive power. On the contrary, the model must be kept simple. We must find the balance between these two points. In case we add too many, we run the risk of causing a situation of multicollinearity.
exactly I want to know how to identify the variables that have collinearity and or are confounders?
 
#5
I am really not sure but I would say that in this case, the solution rests on applying a software. When we do not know about theory and collinearity, we can apply a hierarchical analyze. I think you have to perform a sequence of regression procedures on a step-by-step way, by adding/removing (sets of) IVs. But you can let software automatically choose for you. By the way, when it comes to software, there are three methods: Forward selection, Backward elimination, and stepwise regression. You can do it manually.
If you decide to apply forward selection:
Start with ‘empty model’̂Y=a. After that,you have to follow these steps :
1 Add each non-used IV, one at a time, to the model. Compute its sr2.
2Take the largest sr2. Test its significance, test whether adding the associated predictor to the model significantly increases the Y variance accounted for. If the p-value is below the cutscore: Add Xi to the model, and proceed to Step 1. If thep-value is above the cutscore: Stop algorithm.