+ Reply to Thread
Results 1 to 10 of 10

Thread: Test for multiple means in a dataset

  1. #1
    Points: 773, Level: 14
    Level completed: 73%, Points required for next Level: 27

    Posts
    5
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Test for multiple means in a dataset



    In a dataset of arbitrary size what type of test could I apply to say with confidence that there are actually multiple smaller datasets present?

    For example if the data were to come from N means each with it's own standard deviation. How could I test with confidence how could I determine with confidence what N was?

    Does this question make sense?

    Thanks

  2. #2
    Wrokin' for the Raptors
    Points: 6,456, Level: 52
    Level completed: 53%, Points required for next Level: 94
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    953
    Thanks
    43
    Thanked 184 Times in 142 Posts

    Re: Test for multiple means in a dataset

    there are several methods depending on what mixture of distributions you choose to impose on your data, but my favourite one has always been latent class analysis...

  3. #3
    Points: 773, Level: 14
    Level completed: 73%, Points required for next Level: 27

    Posts
    5
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Test for multiple means in a dataset

    Thank you very much for responding. After posting my question yesterday I had been reading about choosing unequal bin widths in histograms to see if that would help me sort my data. Many of the Latent class analysis examples seem quite complex. At some level does the latent class analysis essentially do the same thing?

  4. #4
    Wrokin' for the Raptors
    Points: 6,456, Level: 52
    Level completed: 53%, Points required for next Level: 94
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    953
    Thanks
    43
    Thanked 184 Times in 142 Posts

    Re: Test for multiple means in a dataset

    Quote Originally Posted by astickel View Post
    I had been reading about choosing unequal bin widths in histograms to see if that would help me sort my data.

    hello there. before i post a more complete answer to your question i would like to know exactly what you're doing in this case.... so are you basically altering the limits of each class to have different frequency sizes? if this is the case, based on what evidence are you or would you be altering the limits so you get more elements in one class rather than in another one?

  5. #5
    TS Contributor
    Points: 4,629, Level: 43
    Level completed: 40%, Points required for next Level: 121
    jpkelley's Avatar
    Location
    Vancouver, BC, Canada
    Posts
    439
    Thanks
    17
    Thanked 88 Times in 83 Posts

    Re: Test for multiple means in a dataset

    Just a quick post without thinking too much about this. If you're not wanting to get too complex, you could try any sort of simple cluster analysis. Here's a website with a couple of options:

    http://www.statmethods.net/advstats/cluster.html

    And an example I just threw together:

    Code: 
    
    
    library(fpc)
    sim.dat<-c(rgamma(250, 8,4),rnorm(125, 16,2),rnorm(200, 28,3),rnorm(50, 50,3))  ## A simulated dataset with overlapping "clusters"
    hist(sim.dat,breaks=40)
    clusters<-pamk(sim.dat,krange=2:7)  # specify a k range of hypothesized clusters. Wider the range, the longer it takes. 
    clusters
    Again, I suggest this without knowing any more information than you originally posted. Others may weigh in about my premature response!

  6. #6
    Wrokin' for the Raptors
    Points: 6,456, Level: 52
    Level completed: 53%, Points required for next Level: 94
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    953
    Thanks
    43
    Thanked 184 Times in 142 Posts

    Re: Test for multiple means in a dataset

    well... cluster analysis is a specific instance of latent class analysis... k-mean cluster analysis, hierarchical cluster analysis, etc. are all specific instances of the more general latent class analysis method, depending on which parameterisation you assume for your data... that's why i said latent class analysis first

  7. #7
    Points: 773, Level: 14
    Level completed: 73%, Points required for next Level: 27

    Posts
    5
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Test for multiple means in a dataset

    Thank-you again.

    The following are a few of my datapoints which in this example clearly don't overlap so this should be easy. What I am hoping to have is an algorithm which like you say will determine how many clusters are most likely in the dataset (In this case two). I also would like the algorithm to calculate what the mean values of each of these clusters is. Most likely there will be between 1 and 4 clusters in my data if that helps too. Thank you both again for your thoughts!

    4.801472667
    3.473225533
    -0.425527926
    4.759339301
    3.993423134
    26.24527325
    27.81263542
    27.68116643
    27.4480586
    27.17240416
    26.66044742
    27.8599481
    28.69223408
    28.08911254
    29.03902105

  8. #8
    Points: 773, Level: 14
    Level completed: 73%, Points required for next Level: 27

    Posts
    5
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Test for multiple means in a dataset

    I installed R with the fpc package and exectuted the sample code from jpkelley, thanks for that. Yes it looks like this is what I am looking for. I will have to do some reading to make sure and make sure I understand this. Thank-you both again for you're help!!!

  9. #9
    Wrokin' for the Raptors
    Points: 6,456, Level: 52
    Level completed: 53%, Points required for next Level: 94
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    953
    Thanks
    43
    Thanked 184 Times in 142 Posts

    Re: Test for multiple means in a dataset

    ha... i was about to suggest to use jpkelly's code which (s)he was so kind to share with us because you're right, latent class analysis can get very tricky and sometimes clustering analysis does just as fine... anyways, if you ever need to look at stuff like the probability of belonging to one group or another you can use the lca command in the e1071 package or the flexmix package and it works similarly to the example posted... give the function a set of data and tell it a range of groups for it to find and it'll find the best number of groups that account for the most patterns of differences in your data...

  10. #10
    TS Contributor
    Points: 4,629, Level: 43
    Level completed: 40%, Points required for next Level: 121
    jpkelley's Avatar
    Location
    Vancouver, BC, Canada
    Posts
    439
    Thanks
    17
    Thanked 88 Times in 83 Posts

    Re: Test for multiple means in a dataset


    Whew, I'm glad it was useful. I wasn't sure if it would be. Once you understand how the k-means analysis works to find the medoids, I think you'll be quite pleased with it. Again, it's simple...that might be good or bad for your purposes.

    (I'm realizing I should have used a different user name to reduce ambiguity...and to sound cooler. I'll introduce myself in the new user thread soon.)

+ Reply to Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts








Advertise on Talk Stats