+ Reply to Thread
Results 1 to 5 of 5

Thread: How to deal with huge differences in group sample sizes

  1. #1

    Question How to deal with huge differences in group sample sizes



    I'm working with a very large dataset of individuals from 6 different groups/samples. The size of these groups is radically different. The largest of the 6 groups makes up 83.7% of the data, while the smallest is .2% of the entire sample. (That group is 6537 records, so it's not insignificant.)

    I have a lot of analysis I need to do, but right now it seems that the large group is swamping the other groups. What alternatives exist for normalizing the data so that no single group overwhelms the analysis?

  2. #2
    RotParaTon
    Points: 46,248, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Discussion EnderPosting AwardFrequent PosterCommunity AwardMaster Tagger
    Dason's Avatar
    Location
    Ames, IA
    Posts
    9,080
    Thanks
    211
    Thanked 1,608 Times in 1,378 Posts

    Re: How to deal with huge differences in group sample sizes

    What exactly is the problem? What type of analysis are you doing?
    "His programming is malfunctioning. It begins! Get your weapons, he's going to become a killbot!!!" - bryangoodrich

  3. #3
    Points: 231, Level: 4
    Level completed: 62%, Points required for next Level: 19

    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: How to deal with huge differences in group sample sizes

    To start with I was doing a chi-square until I realized that essentially it was comparing all the other groups to the distribution of the 83.7% group. I intend to do a logistic regression on the data, but because the one group is so much larger I worry that what I will be doing in essence is really just a logistic regression on that large group. Is there a way to weight the data so the samples are more even? Or would I be best off taking a sample of the large group and working with that?

  4. #4
    TS Contributor
    Points: 5,660, Level: 48
    Level completed: 55%, Points required for next Level: 90
    Karabiner's Avatar
    Location
    Schalke 04, Germany
    Posts
    858
    Thanks
    8
    Thanked 201 Times in 194 Posts

    Re: How to deal with huge differences in group sample sizes

    What is this all about? What are your research questions?
    What are the variables which you want to analyse?

    Kind regards

    K.

  5. #5
    Points: 231, Level: 4
    Level completed: 62%, Points required for next Level: 19

    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: How to deal with huge differences in group sample sizes


    The data is individual students in courses from several different institutions. One institution is MUCH larger than the others. At this point I'm doing exploratory data analysis, trying to understand the influences on whether students pass or fail their courses, and whether they stay or drop out of school. I have all types of variables available. Because I am still at the exploratory stage, I don't really have a good sense of all the analyses I want to do, but that also leaves me open to trying different techniques.

+ Reply to Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts








Advertise on Talk Stats