+ Reply to Thread
Results 1 to 3 of 3

Thread: Fishing with data: Use of multiple regressions on the same data set

  1. #1
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Fishing with data: Use of multiple regressions on the same data set




    I have approached this issue, running multiple models with the same data set usually in the context of FW error [although I never found a treatment in the literature or text that addresses this]. It is the habit of running multiple tests, using different variables, on the same data set.

    Some question the validity of doing this [cwb posted this on another thread]

    http://www.stat.columbia.edu/~gelman.../p_hacking.pdf

    Here is one example of where you might want to do exactly what the seam to condemn[box tidwel is another in logistic regression]. Note in this case, while conducting multiple test, the issue is artificial to some extent since the variables created are designed only to test assumptions [but what about specifying a quadratic term to test for non-linearity and then taking it out if it does not work. Is that running the model multiple times]?

    A statistical test for linearity can be constructed by adding powers of fitted values to the regression model, and then testing the hypothesis of linearity by testing the hypothesis that the added parameters have values equal to zero. This is known as the RESET test (Ramsey).
    The problem I have with the argument that it is not valid to use the same data set to test multiple models [it seems the only valid approach would be to do planned contrast] is that you have limited choice. Academics can not re collect data every time they want to test a new model. And commonly in the real world there is no more data to retest.

    http://www.albany.edu/~po467/EPI553/...ssumptions.pdf

    They seem almost to be questioning the validity of p values to me...
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  2. #2
    Omega Contributor
    Points: 38,289, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,992
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: Fishing with data: Use of multiple regressions on the same data set

    There is always Lasso regression for when you have big data that seems a little exploratory.
    Stop cowardice, ban guns!

  3. #3
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Fishing with data: Use of multiple regressions on the same data set


    Another method to read about

    But this really is a philosophical/methods discussion at its core. I think most use the same data set to run multiple models [think post hoc tests]. The question is how valid is that and what can you do to limit the threat.

    More and more I think people are moving away from p values. But that creates a variety of issues on its own.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats