# Can I use multivariate analysis without univariate analysis?

#### celinetang

##### New Member
Dear all,

I have a problem in my data analysis and paper writing.
Some variables (dependents) were not statistically significantly correlated with the independents in the univariate correlation analysis, but were significant in the multiple linear regression analysis.
Thus, can I use the multiple linear regression only without the univariate analysis in the paper to point out the predictive ability of these variables?

Thanks.

#### GretaGarbo

##### Human
Thus, can I use the multiple linear regression only without the univariate analysis in the paper to point out the predictive ability of these variables?
Yes!

.,

#### noetsi

##### No cake for spunky
The marginal effects generated by multiple regression can be completely different than univariate results. Thus univariate analysis can lead one astray. In journal articles it's rare to see univariate analysis when multivariate analysis is being done (which it almost always is).

#### celinetang

##### New Member
The marginal effects generated by multiple regression can be completely different than univariate results. Thus univariate analysis can lead one astray. In journal articles it's rare to see univariate analysis when multivariate analysis is being done (which it almost always is).
Thank you Noetsi.
But during the course of data analysis in many clinical analysis paper, a common practice is to include in multivariate analysis only those variables that are statistically significant in univariate analysis. That is to say, univariate analysis is used as a screening tool for the subsequent multiple linear regression analysis. Thus, I am worried about being challenged in how to select dependents (potential risk factors) in the regression model.

#### noetsi

##### No cake for spunky
A basic problem with that approach is that it is possible for a univariate analysis (which really means a bivariate comparison between a DV and an IV) to show a signficant effect where none exists (or where a weak relationship exists) because you have failed to control for other variables or a spurious relationship. A silly example used to demonstrate this is that if you run a univariate comparison between ice cream sales and murder rates you will find that murder rates goes up strongly with ice cream sales. So on this basis you might conclude that ice cream consumptions leads to murders.

In fact of course ice cream sales goes up with summer and summers for various reasons increases murders. This is a (admitedly silly) example of why univariate analysis is doubtful. Ulttimately there are two solutions to your concern. One possibility is to build (based on your own views and literature) a theory that you can then test with multivariate analysis. Once you have a theory to test you don't need to do any empirical univariate analysis (you could also argue that a univarate analysis while commonly done is wrong, but that would annoy people who do it rarely a good thing if you are trying to publish). You could also find articles, they are common in my field of public administration, that go straight to multivariate analysis without univariate analysis.

#### threestars

##### New Member
+1 for noetsi's answer. In observational research you should almost never (barring extreme cases) use bi-variate methodologies.

#### noetsi

##### No cake for spunky
This thread reflects a pet peave of mine. Although in no way an "expert" in statistics as others here, it's appaling to me how often academics (which I was long ago) do basic things wrong even in journals. Using stepwise regression, an overemphais on normality in many methods, confusing relative risk with odds ratios its a long list. I read an article once reflecting just how common basic errors were in methods in medical journals - a truly scary thought. I ran into it on my dissertation when committee members, unfamiliar with qualitative approaches wanted research done in a way that fundamentally violated best practices in that area (something a qualitative advisor of mine later took them to task for).

But if methods are done wrong commonly, as in the example by the original poster, it's hard to win that dispute if you want to get published.

#### victorxstc

##### Pirate
Although in no way an "expert" in statistics as others here
Another very humble poster

Noetsi I can't agree more about numerous widespread basic statistical errors I personally see everyday even in top journals of my own field (let alone in local journals or in theses). Once I had set the alpha at 0.01 and the reviewer told me "Why haven't you determined the correct P value, which is 0.05" lol he/she thought that 0.05 is the "correct" one. There are many other stories too but too lazy to recall right now!

#### GretaGarbo

##### Human
But during the course of data analysis in many clinical analysis paper, a common practice is to include in multivariate analysis only those variables that are statistically significant in univariate analysis. That is to say, univariate analysis is used as a screening tool for the subsequent multiple linear regression analysis.
This sounds completely wrong. Can it really be this bad among journals?

#### noetsi

##### No cake for spunky
Can it really be this bad among journals?
Well it is common to see stepwise regression and apparently even elite medical journals confuse relative risk with odds ratios (which are not the same thing normally except in very unsual rare events).

A personal favorite of mine is that one of my statistical professors had her article accepted a few years ago only after agreeing to turn her interval variable into a categorical one, because that was the accepted way (and the accepted method which was probably logistic regression rather than linear regression) that journal did things. Once a method or way of doing things gets accepted, its not easy to break this (well at least until you get a new editor I would imagine).

Once I had set the alpha at 0.01 and the reviewer told me "Why haven't you determined the correct P value, which is 0.05" lol he/she thought that 0.05 is the "correct" one.
Alpha isn't a p value at all as far as I understand that It's your willingess to accept a type I error. And regardless of that, there is no real statistical reason that you should reject the null at .05 but not reject it if p = .052. Nothing magical at all about .05. It grew out of the history of research and was (for a while, but rarely today) hotly debated. But its treated as magic today, if you have a p value lower than .05 (or .01 or whatever the journal uses) it means the null should be rejected. Otherwise you should not. Rarely does anyone ask why that is the case.

#### victorxstc

##### Pirate
This sounds completely wrong. Can it really be this bad among journals?
But that is a well-accepted method (unfortunately maybe). I have seen it too, and asked about the same thing before.

I have seen that previous studies first have run bivariate chi-square tests and then entered only those variables with chi-square P values < 0.1 or smaller than 0.15 into the model...I don't know if it is a valid method.
The conversation in that thread, and especially this nice post of ledzep might be very useful for this question asked by celinetang.

-------------

Alpha isn't a p value at all as far as I understand that It's your willingess to accept a type I error. And regardless of that, there is no real statistical reason that you should reject the null at .05 but not reject it if p = .052. Nothing magical at all about .05. It grew out of the history of research and was (for a while, but rarely today) hotly debated. But its treated as magic today, if you have a p value lower than .05 (or .01 or whatever the journal uses) it means the null should be rejected. Otherwise you should not. Rarely does anyone ask why that is the case.
Yeah I know I know It was the reviewer who did not know the difference between alpha and P value, the reasoning behind determining the alpha (power calculations), and thinking that there is a "correct" alpha, which he called P value... I remembered he told me "why not set the P value at 0.05 'like all other studies'" lol

I wanted to accentuate the level of statistical knowledge of a reviewer of one of the good international journals in my field.

A personal favorite of mine is that one of my statistical professors had her article accepted a few years ago only after agreeing to turn her interval variable into a categorical one, because that was the accepted way (and the accepted method which was probably logistic regression rather than linear regression) that journal did things. Once a method or way of doing things gets accepted, its not easy to break this (well at least until you get a new editor I would imagine).
Sometimes the bias is even stronger. A journal editor/statistician I know in person filters out nearly all the articles not prepared under his supervision (unless they have been submitted from another institute). So not only he is biased about his own [usually incorrect] method, but also he is biased about his "name".

#### Englund

##### TS Contributor
And regardless of that, there is no real statistical reason that you should reject the null at .05 but not reject it if p = .052.
I don't really agree on that (given that the chosen alfa-level is set to 0.05). If we set alfa to 0.05 and we get a p-value of 0.052, if we then would reject the null hypothesis we are doing something very wrong. Let me try to explain why.

Silly but thoughtful example: Suppose we select alfa after we have calculated the p-value, and we then set alfa slightly above this observed p-value. Voila, we reject the null hypothesis since the p-value is below the alfa-level. What is the probability in cases like this that we reject the null hypothesis? It's 1.

This also works the other way around. If we select alfa slightly below our observed p-value, we'd accept the null hypothesis with probability 1.

Edit: But yes, I agree on that a p-value of 0.052 is, in practice, no different from 0.05.

Last edited:

#### victorxstc

##### Pirate
I think noetsi's point was not to change the alpha "after" calculating the P value. But he was saying at an alpha = 0.05, a P = 0.052 can be still considered significant. The point is not to change the line (alpha) after the P calculation. But that line is not holy and we should not rely on it too much.

It sounds a little bit robotic to rely on a blind line. Then why 0.05, but not 0.05212541254? Only because 0.05 is a rounded value.

suppose two similar studies with exactly same methods find P values 0.49 and 0.51, respectively. According to the P value rule, one is significant and the other not. That could account for a nonsense controversy if they strongly believe in that 0.05 thing.

#### noetsi

##### No cake for spunky
What I meant Englund is that there is no real reason to set alpha originally at .05 as compared to .051 or .048. Not that once you do so you should still not reject a null with a p over that (although I have seen authors suggest p values very close to alpha should be looked at again or were uncertain). In practice, however, I am not sure its really valid (given limits on data in the real world) to say that a p of .051 means you can conclusively not reject the null but at .049 you can. I don't think our methods or data is strong enough to do so even if there was any theoretical reason that .050 was the right alpha level to pick in the first place. And .050 was picked not through theory but through historical convenience.

But in term of your comments, I think its fundamentally invalid to chose a alpha level after you know what your p value is. There is not much point of doing a statistical test if someone was to do analysis that way (in ANOVA a less extreme version of this issue deals with the greater power of planned contrast as compared to post hoc test where you chose your hypothesis after seeing the results).

#### RosaCuba

##### New Member
Hello, this is a thread that worked 2 years ago but someone might still want to comment something about it, like me.
With regard to using univariate analysis (or better bivariate analysis) as a filter for doing multivariate analysis afterwards I can say it´s a very common practice. I don´t see why bivariate analysis can be useful for this. If the reason is that some variables can be significantly associated with the response variable bivariately and not associated when you control for another variables, the response to this reason is that the contrary can also happen. One variable can be not associated wih the response variables in the bivariate case ande become sgnificantly associated in the multivariate case. So, this is not the point. Think about it.