im a newbie and i need some help with regression :(


masters student here :wave: im so lost in my statistical analysis and my supervisor isn't really helpful :(

for my model i have 3 independent variables (however, they do measure the same thing sort of so it was expected that they would be very highly correlated, which when i finally got my data i was able to confirm in spss) and 352572 dependent ones. okay not that many but a lot.. and a ton of covariates that my supervisor told me to check and see which one has the greatest effect...

im really confused as to how to start. we had a huge discussion that apparently i cannot enter all my independent variables into the independent variables box in spss (analyze-->regression-->linear) because spss adjusts for everything after the first one that is on the list. also, she told me that she also doesnt know whether there would be a linear relationship at all between my independent variables and my dependent ones. so where the hell do i start and what should my first steps be? i thought i was doing multiple linear regression and for months i studied that and now i feel like that isnt exactly what i am doing? she wants me to enter one independent variable at once and one dependent variable and then what? what sort of model am i using for this? i am really confused :( and when do i check for my assumptions?

would i be correct if I:
1. enter 1 dependent variable and 1 independent one and in the same box enter all my covariates, and then one by one remove the covariates and look at my Betas? if more than 10% change that is a confounder?
2. look at the interaction between the ..? the what ? the covariate and the dependent or the independent variable?

im very confused and youtube isnt helping much at this point :(


Omega Contributor
You probably need some content in the field of research to guide you so you are not just flying blindly running countless models. By change and using 0.05 level of significant you are saying by chance some of these combination are going to be significant, even though that is not the true relationship. Given the shear number of things you need to test, I would see if you can whittle down the DV list based on content knowledge, because if you just run over a million models you would have to correct your cut-off value for significance to prevent false discovery.

Perhaps you can see what the genetics folks do, when examining SNPs and follow their approaches. Because this seems like an automation job, and if you have to look at interaction terms as well, running simple models would take you years.