# Sample Size Selection

#### Bird1990

Hi All I am having trouble with a sample size selection. I would like to run a study on the following applications to a program and record how many passed/failed. I have no control group. the numbers are as follows.

N=364

_______(year1) (year2)
(Group1) 124______30
(Group2) 200______10

I would like a confidence level of 95%. I am unsure as to which equation to use and how I should go about selecting a sample from each group/year in order to review applications to calculate an error rate. If there is anyone out there that could walk me through the steps to calculating how many applications I should pull from each group/year I would very much appreciate it.

#### statsanon

Can you clarify. Is this a repeated measure with year 1 measures and year 2. Or simply a measure at year 2 after people have been in a program since year 1. Given your extremely high attrition rate if you want to analyse your year 2 data you will have to perform many analyses to find the reasons for the drop out rate and check the characteristics of the year 2 sample against the year 1 sample to ensure that they are equivalent.

#### Bird1990

Can you clarify. Is this a repeated measure with year 1 measures and year 2. Or simply a measure at year 2 after people have been in a program since year 1. Given your extremely high attrition rate if you want to analyse your year 2 data you will have to perform many analyses to find the reasons for the drop out rate and check the characteristics of the year 2 sample against the year 1 sample to ensure that they are equivalent.
Thank you for your reply. So I am not interested in comparing between years or even between groups. The requirements for application are different depending on what group you applied under and also different depending on what year you applied in. I assumed that this meant I needed to pull a certain amount from each group/year treating them as their own population. I am really just interested in how you would pick a sample size in this case. Because of the rule change across group and year I really don't think any analysis across these would make sense due to con-founders.

#### statsanon

Thanks for clarifying. So its purely sampling question from those 4 cohorts and treating each as a separate population, with no need to compare between populations. In that case you treat each group and year as a sampling frame and use the appropriate sampling method to give you a sample representative of those populations and of sufficient size for your analysis.

For example, simple random sample, stratified samples (eg across gender) etc. You also really have no need for samples to be of the same size unless by any chance you did want to make any comparison of effects and significance. You are rather restricted by the small population sizes in two group/year combos in which case you dont really need a random sample. You could analyses the whole population in those cases.

Do you know what types of analyses you are doing in order to estimate a sample size. Also what type of measures are you using. That impacts on costs/practicalities etc of your sampling strategy. Different analyses have different requirements for sample size to give you adequate power. Also do you know the distributions of your test. Are they normal or approximately normal.

