# Thread: Fishing with data: Use of multiple regressions on the same data set

1. ## Fishing with data: Use of multiple regressions on the same data set

I have approached this issue, running multiple models with the same data set usually in the context of FW error [although I never found a treatment in the literature or text that addresses this]. It is the habit of running multiple tests, using different variables, on the same data set.

Some question the validity of doing this [cwb posted this on another thread]

http://www.stat.columbia.edu/~gelman.../p_hacking.pdf

Here is one example of where you might want to do exactly what the seam to condemn[box tidwel is another in logistic regression]. Note in this case, while conducting multiple test, the issue is artificial to some extent since the variables created are designed only to test assumptions [but what about specifying a quadratic term to test for non-linearity and then taking it out if it does not work. Is that running the model multiple times]?

A statistical test for linearity can be constructed by adding powers of fitted values to the regression model, and then testing the hypothesis of linearity by testing the hypothesis that the added parameters have values equal to zero. This is known as the RESET test (Ramsey).
The problem I have with the argument that it is not valid to use the same data set to test multiple models [it seems the only valid approach would be to do planned contrast] is that you have limited choice. Academics can not re collect data every time they want to test a new model. And commonly in the real world there is no more data to retest.

http://www.albany.edu/~po467/EPI553/...ssumptions.pdf

They seem almost to be questioning the validity of p values to me...

2. ## Re: Fishing with data: Use of multiple regressions on the same data set

There is always Lasso regression for when you have big data that seems a little exploratory.

3. ## Re: Fishing with data: Use of multiple regressions on the same data set

But this really is a philosophical/methods discussion at its core. I think most use the same data set to run multiple models [think post hoc tests]. The question is how valid is that and what can you do to limit the threat.

More and more I think people are moving away from p values. But that creates a variety of issues on its own.

 Tweet

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts