+ Reply to Thread
Results 1 to 6 of 6

Thread: What to do if sample size is too big? (ttest)

  1. #1
    Points: 1,271, Level: 19
    Level completed: 72%, Points required for next Level: 29

    Location
    Texas
    Posts
    12
    Thanks
    2
    Thanked 0 Times in 0 Posts

    What to do if sample size is too big? (ttest)




    So I'm running ttests for a descriptive statistic table and all my results are significant, but I"m concerned that the sample size is too big 3,880 obs for each of the two groups. (This is bank data by qtr appended into one file.) I understand that the large sample size is probably making everything significant and that I should be looking at the difference between means for more data. From the results, I think the differences appear to be large enough to justify the results. My question is, what do I do next to ensure that if (read when) someone notices that the sample sizes are big, that I have run the next test to ensure that the results are valid, even with such a large sample size.
    my next step now is to look at p-plots for each variable and keep them in my back pocket.

    any advise or discussion would be most welcome.

    vikingo
    Phd student finance
    Stata

  2. #2
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: What to do if sample size is too big? (ttest)

    I think the "next step" is simply to point out in the article that large sample size may be artificially inflating your statistical results. There is no agreement on what a large sample is so there is no test of that nor is there (as far as I know) a way to show what the p value would be if you had a smaller sample size. You could of course take a sample of your sample and show the change in p value associated with this, but I have never seen this done and I doubt it would be a valid approach.

    Going on to talk about the effect size, assuming you have a good basis to determine what a reasonable effect size is, would be the logical place to go. Strangely I have never seen that done in the social science literature - but it should be done.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  3. The Following User Says Thank You to noetsi For This Useful Post:

    vikingo (01-23-2014)

  4. #3
    Devorador de queso
    Points: 95,540, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,930
    Thanks
    307
    Thanked 2,629 Times in 2,245 Posts

    Re: What to do if sample size is too big? (ttest)

    Look at effect size and present confidence intervals for the quantities/differences of interest. There is nothing wrong with doing signficance tests with large sample sizes. You just have to keep in mind that all it tells you about is what you can or can not reasonably say about the null hypothesis.
    I don't have emotions and sometimes that makes me very sad.

  5. #4
    Omega Contributor
    Points: 38,284, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,991
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: What to do if sample size is too big? (ttest)

    I agree with Dason's post.

    I recently reviewed an article that also presented effect sizes in the table along with p-value - and I found it very beneficial. Also, per good ettique, place the n-value in the title of the table and/or in the table. Savvy readers will always register the sample size when reviewing the results.
    Stop cowardice, ban guns!

  6. #5
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: What to do if sample size is too big? (ttest)

    Dason said pretty much what I was trying to The only problem to me is that while you can say that larger sample size influences the test of statistical signficance (it reduces SE and thus ultimately p values) there is no way to say how much exactly. Or at least I have never seen this done. Dason might have.

    It is not common in the Social Science literature (or was not when I was actively reading it more than a decade ago) to talk about effect size in articles - the stress is on test of signficance. Part of that is the historical legacy of stressing formal test of significance - inertia is always strong in academics. But in addition, I don't think researchers have a good sense in a given field what a large moderate and small effect size is. So talking about it is difficult, you have no context (meta analysis might be useful here, but in most cases there won't be enough articles to generate it because too little research on that area has been done).

    This is why Cohen's rule is so widely taught, even though it makes little sense in a specific area.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  7. #6
    Points: 2,109, Level: 27
    Level completed: 73%, Points required for next Level: 41
    ted00's Avatar
    Location
    USA
    Posts
    237
    Thanks
    21
    Thanked 29 Times in 25 Posts

    Re: What to do if sample size is too big? (ttest)


    These are good suggestions. Including confidence interval with p-value is fine, it's done all the time ... just make sure they agree!! I've actually seen someone report CI and p-vals that *didn't* agree (in a large table of model effects), and when asked they had no idea why, turns out they were using different tests for CI and p-vals, using default sas output, I think it was LRT vs. Wald. Also, if you really want to visualize how your power changes for different sample sizes, i.e. if you want to see what'd happen if you had only 2000, 1000, 500, etc. rather than 3880, try this http://powerandsamplesize.com/Calculators/ , some default graphs there, but there's R code you can use to investigate further if it tickles your fancy

+ Reply to Thread

           




Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats