+ Reply to Thread
Results 1 to 6 of 6

Thread: Should splitting data improve the standard error?

  1. #1
    Points: 1,974, Level: 26
    Level completed: 74%, Points required for next Level: 26

    Location
    New Zealand
    Posts
    227
    Thanks
    3
    Thanked 48 Times in 47 Posts

    Should splitting data improve the standard error?




    I am simulating random samples of 20 from normal data mean 20, SD 5. The SE of the mean should be SD/sqrt(20) = 1.12. If I estimate the SE from this formula it will vary from sample to sample, but over several thousand iterations it settles down to an average value of 1.12. All good so far.

    I can also split the sample of 20 into two samples of 10 and get two “submeans”. The SD of these submeans is 5/sqrt(10). Now, if I find the SD of these two submeans and estimate the SE using the formula SD of the means/sqrt(2), I should get another estimate of the se of the overall mean because (5/sqrt(10))/sqrt(2) = 5/sqrt(20) as before. Each sample will produce its own submeans and estimate of the se of the overall mean. However, when I do this several thousand times, the estimated se averages out at 0.82. Apparently splitting the data and using the formula SD of the means/sqrt(2)has made the SE smaller on average.

    This doesn’t seem right. Any thoughts?

  2. #2
    Omega Contributor
    Points: 38,284, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,991
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: Should splitting data improve the standard error?

    So are you now taking samples of 10 or are you taking randomly taking 10 values from your sample of 20?
    Stop cowardice, ban guns!

  3. #3
    Points: 1,974, Level: 26
    Level completed: 74%, Points required for next Level: 26

    Location
    New Zealand
    Posts
    227
    Thanks
    3
    Thanked 48 Times in 47 Posts

    Re: Should splitting data improve the standard error?

    Thanks for looking at this. In the simulation, either two samples of 10, and then combining them into one. or putting the first 10 out of 20 into one sample and the other half in the second - it makes no difference to the results. The situation is this - I'm planning several estimates of average tree size in a (hopefully) uniform forest, say 15 of them of 30 trees each. The traditional method is to average the 15 estimates and use the SD of the means and the number of samples to get SE = SD/sqrt(15). However, with the data I've got it is also possible to pool the data and find a mean and SE from the pooled data using SD of all the data/sqrt(450). So what I put in my first post was just a mini version of the real situation. The problem is that the SE from the traditional method is lower on average than from the pooled method, but the sampling distribution of the SE from the traditional method is much larger than from the pooled method so the pooled estimate is more precise. I think the pooled method is correct (or better, anyway) even if it is higher, but I have to convince other people. Cheers

  4. #4
    Points: 131, Level: 2
    Level completed: 62%, Points required for next Level: 19

    Posts
    5
    Thanks
    0
    Thanked 2 Times in 2 Posts

    Re: Should splitting data improve the standard error?

    Quote Originally Posted by katxt View Post
    Thanks for looking at this. In the simulation, either two samples of 10, and then combining them into one. or putting the first 10 out of 20 into one sample and the other half in the second - it makes no difference to the results. The situation is this - I'm planning several estimates of average tree size in a (hopefully) uniform forest, say 15 of them of 30 trees each. The traditional method is to average the 15 estimates and use the SD of the means and the number of samples to get SE = SD/sqrt(15). However, with the data I've got it is also possible to pool the data and find a mean and SE from the pooled data using SD of all the data/sqrt(450). So what I put in my first post was just a mini version of the real situation. The problem is that the SE from the traditional method is lower on average than from the pooled method, but the sampling distribution of the SE from the traditional method is much larger than from the pooled method so the pooled estimate is more precise. I think the pooled method is correct (or better, anyway) even if it is higher, but I have to convince other people. Cheers
    I'm convinced. pooled method is batter

  5. The Following User Says Thank You to janessa642 For This Useful Post:

    katxt (10-13-2016)

  6. #5
    TS Contributor
    Points: 12,227, Level: 72
    Level completed: 45%, Points required for next Level: 223
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,470
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: Should splitting data improve the standard error?

    hi,
    maybe your simulation has an error? I also simulated these scenarios and when I split the original group in two the variance increased by a 2 - in general if I split an original group in k subgroups the variance of the mean increases by k, the standard error increases by sqrt(k). So, it is definitely a bad idea to split the group.

    regards

  7. The Following User Says Thank You to rogojel For This Useful Post:

    katxt (10-13-2016)

  8. #6
    TS Contributor
    Points: 12,227, Level: 72
    Level completed: 45%, Points required for next Level: 223
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,470
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: Should splitting data improve the standard error?


    BTW,
    it should be simple to demonstrate this mathematically, could anyone give it a try?

    It seems that V(N/k)=k*V(N) where V(N) is the variance of the mean estimate using samples of size N and V(N/k) is the variance of the mean estimate using samples of size N/k first and calculating the average of the k estimates second?

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats