+ Reply to Thread
Results 1 to 9 of 9

Thread: Significant

  1. #1
    Points: 5,083, Level: 45
    Level completed: 67%, Points required for next Level: 67

    Posts
    109
    Thanks
    19
    Thanked 3 Times in 3 Posts

    Significant




    So I have this data set and when looking for a relationship between two variables i created the attached plot.

    Then i started to fit a model to the data. I was surprised that when i fitted a simple linear model the regression coefficients were significant:

    Code: 
    Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
    (Intercept)    5.1904     0.1347   38.53   <2e-16 ***
    nucsomeI$med  -2.1237     0.1090  -19.48   <2e-16 ***
    Just looking at the plot satisfies me that there is no linear relationship between the variables - so how do i get such significant regression coefficients? Is it related to the fact there are > 2,500,000 data points?


    Eye-balling the plot i thought an exponential model might be a better choice, so i tried:

    Code: 
    Formula: nucsomeI.insert_count ~ exp(a + b * nucsomeI.med)
    
    Parameters:
      Estimate Std. Error t value Pr(>|t|)    
    a  2.19786    0.02950   74.50   <2e-16 ***
    b -1.89507    0.09435  -20.08   <2e-16 ***
    Then how to interpret these significance levels - if everything i try is 'significant' surely i can no longer trust this measure to fit models.

    That said, i'm probably doing something stupid, so any help much appreciated.

  2. #2
    Points: 5,083, Level: 45
    Level completed: 67%, Points required for next Level: 67

    Posts
    109
    Thanks
    19
    Thanked 3 Times in 3 Posts

    Re: Significant

    PDF was too large so here's the plot in word.

    P.S: Forgot to complete the thread title - any way of altering that now?
    Attached Files

  3. #3
    TS Contributor
    Points: 17,749, Level: 84
    Level completed: 80%, Points required for next Level: 101
    Karabiner's Avatar
    Location
    FC Schalke 04, Germany
    Posts
    2,540
    Thanks
    56
    Thanked 639 Times in 601 Posts

    Re: Significant

    Is it related to the fact there are > 2,500,000 data points?
    Yes.

    Then how to interpret these significance levels - if everything i try is 'significant' surely i can no longer trust this measure to fit models.
    AFAIK, p-values are not indicators of model fits.
    They are about the probabilty of the data, given
    the null hypothesis.

    With kind regards

    K.

  4. #4
    Omega Contributor
    Points: 38,289, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,992
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: Significant

    So you have count data for the dependent variable that approximates normal distribution?


    If you wonder if the model is just "over-powered", as your self if the interpretation of the coefficient is meaningful given the study context.


    Can you tell us more about how these variables are formatted?
    Stop cowardice, ban guns!

  5. #5
    Points: 5,083, Level: 45
    Level completed: 67%, Points required for next Level: 67

    Posts
    109
    Thanks
    19
    Thanked 3 Times in 3 Posts

    Re: Significant

    Cheers guys.

    The dependent variable is count data. I am not sure how it is distributed - about 93% of the data has a count of zero, but then the maximum count is close to 100,000. I've been looking at zero-inflated poisson/negative binomial models, but i'm not really sure.

    The independent variable is a continuous variable that possibly influences the count data. This is the most important question i wish to answer, but then i also thought it might be good to try to describe the relationship of the 2 variables via GLM.

  6. #6
    TS Contributor
    Points: 18,889, Level: 87
    Level completed: 8%, Points required for next Level: 461
    CowboyBear's Avatar
    Location
    New Zealand
    Posts
    2,062
    Thanks
    121
    Thanked 427 Times in 328 Posts

    Re: Significant

    I think you might have the wrong idea about what statistical significance means. (See my FAQ post here).

    A small p value simply means that, if the true value of the parameter in the population was exactly zero, then it'd be unlikely that you'd observe a test statistic as large or larger than the one observed in your sample. It does not mean that the relationship you've observed is large or important or "significant" in a common-language sense.

  7. #7
    Points: 5,083, Level: 45
    Level completed: 67%, Points required for next Level: 67

    Posts
    109
    Thanks
    19
    Thanked 3 Times in 3 Posts

    Re: Significant

    So the null hypothesis in the first case is if we fit a linear model to the data then the chance of seeing those regression coefficients assuming the null hypothesis of the slope being 0 is given by the p-value? (what is the null hypothesis for the intercept - that it goes through the origin?).

    Then when fitting the exponential model similar tests of significance are performed (the null being the parameters are equal to zero?).

    How then to select the 'best' model? R only returned an AIC for the linear model. Looking at the plot i'd have thought an exponential curve fits better. There are presumably loads of models i could try fitting, but never get round to trying, i wouldn't ever know if i missed a 'better' model.

  8. #8
    TS Contributor
    Points: 18,889, Level: 87
    Level completed: 8%, Points required for next Level: 461
    CowboyBear's Avatar
    Location
    New Zealand
    Posts
    2,062
    Thanks
    121
    Thanked 427 Times in 328 Posts

    Re: Significant

    Quote Originally Posted by Prometheus View Post
    So the null hypothesis in the first case is if we fit a linear model to the data then the chance of seeing those regression coefficients assuming the null hypothesis of the slope being 0 is given by the p-value?
    The null hypothesis for a particular parameter (e.g., a specific slope) is that the true value of that parameter in the population was actually zero. E.g., if we got the full population of data, and fit the model, the specific slope parameter we're looking at would be exactly zero. The p value is the probability of observing a coefficient (estimate) of that parameter as large as the one we've seen, if the null hypothesis was true.

    (what is the null hypothesis for the intercept - that it goes through the origin?).
    Yes

    Then when fitting the exponential model similar tests of significance are performed (the null being the parameters are equal to zero?).
    Each null is generally for a specific parameter (aside from omnibus tests and so on).

    How then to select the 'best' model?
    Definitely not via significance testing! Too big a topic for a casual reply, but try a search for regression model selection.

  9. #9
    Points: 5,083, Level: 45
    Level completed: 67%, Points required for next Level: 67

    Posts
    109
    Thanks
    19
    Thanked 3 Times in 3 Posts

    Re: Significant


    Quote Originally Posted by CowboyBear View Post
    Definitely not via significance testing! Too big a topic for a casual reply, but try a search for regression model selection.
    Woe betide me, Pandora's jar unleashed.

    This may take a while...

  10. The Following User Says Thank You to Prometheus For This Useful Post:

    CowboyBear (04-28-2016)

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats