I have a sample of 80 people, some ill, some healthy. For each one I have a series of let's say for the sake of example, 300 different blood tests. I want to check for each blood test, if there are significant differences between healthy and ill. That means 300 t-tests (or non parametric equivalent). The problem is "over fitting". At the 5% significance level, I expect to receive 15 p values under 0.05, which will be type I errors.
I am trying to use FDR to overcome this problem, I work with the fdrtool package in R. One of the outputs is a vector of "q-values". Are these the values I should refer to instead of the p values ?
any other ideas how to handle such a situation ?
thanks
> fdrtool(p_value_t,statistic="pvalue",plot=TRUE,color.figure=TRUE)
Step 1... determine cutoff point
Step 2... estimate parameters of null distribution and eta0
Step 3... compute p-values and estimate empirical PDF/CDF
Step 4... compute q-values and local fdr
Step 5... prepare for plotting
In a way. You call anything with a q-value below you predetermined level you want to control for FDR significant. You can't really interpret the q-values after that though. There are more sophisticated ways to estimate your q-values but I can't find any links at the moment. It has to do with estimating the number of true null hypothesis and using that to create a not so harsh penalty on the q-values.
the q values are higher than p values, so will it make sense to choose a predetermined level of above 0.05 ? (assuming that my p value cutoff is 0.05)
I know that there is no rule of how to choose the predetermined level, but still, how do you do it ?
You choose your cutoff to control the FDR. Whatever you want to control FDR at is your cutoff. So let's say you only want an expected FDR of .05 - choose your cutoff to be .05