Asking multiple research questions of the same data set; when is the n "large enough"

#1
Hi everyone,

Context:
I was recently told that I may not use the experimental data I collected from 315 participants for future studies, because "the n was not large enough." It seems that many journals in my field discourage authors from revisiting data they examined in prior research (with some exception for NCES collected data sets). However, I know this is standard practice in sociology, psychology, economics, and other sciences.

My questions are:

1. What constitutes robust enough data for multiple studies?
2. Is the n and the representative nature of the sample all one considers when asking additional research questions at a later date?
3. Are there any reputable citations or sources that could shine some light on this question?

Thank you so much for your help! I really appreciate your time reading this and thinking about this question.
 

noetsi

No cake for spunky
#2
Re: Asking multiple research questions of the same data set; when is the n "large eno

If they said the n was not enough this might mean two very different things. First, they may mean you statistical power is too low to meet limits set by the federal government for grants which is increasinly influential I believe in academic research. Typically a power below .8 won't by accepted by the federal government and a small sample size commonly results in that. My guess is that is what is actually occuring here.

A second issue is that formally analysis assumes that you are using the data to test a single null hypothesis (one of the most violated rules in research). When you use it for more than one test your nominal alpha level understates the true chance of making a type I error in at least one of the tests. Bonferoni or other corrections can be used to address this - which is why I suspect this is not the issue here.

A third issue, which has nothing to do with statistics, is that some journals object to you using data for analysis you already published (or so I suspect - this used to be common in social science research). I think this occurs particularly if the data is older.

1) If power is not the issue (nor my third point which has to do with a subjective criteria only) then the answer is whatever quanity of data is needed to generate an acceptable alpha (usually .05) after the correction. The more test you do the more data you will need to meet this requirement. I don't know how you calculate what specifically this number will be, but you might find it in discussions of bonferonni corrections.

2) I don't think there is an agreed on answer to this. It depends on the journal in question and to large extent accepted practice in your field. To a large extent it depends of whether your data is pertinant to the new question you are asking and how old it is. That is an answer which is inherently subjective - ultimately the peer review members decide it. All I can suggest is talking to faculty in your area of research and asking them. I don't think there is an absolute, validated answer to this question (nor do I think fields or journals would agree on one even if it existed).:p

3) This is again subjective. I suspect you will rarely find a journal article where the author says "I used this data N times before but it is still valid because of .....). They might cite previous research they did with the data, but that is the best you are going to find. This ends up being a judgement of the field (of your peers) on what is acceptable not any well accepted rules. The best you could do is talk to other researchers in your area.

This is to a large extent my opinion bases on reading analysis in academic journals for many years. Others might have more clear statistical perspectives on it. :)