Looping over all variables

#1
Hello,

I am using STATA for my analysis, but I am not a good programmer in STATA yet.

I am dealing with a multiple hypothesis testing problem.

I have a dataset of let's say n observations, and (p+1) variables, 1 dependent and p independent (n<<p). The dependent variable is nominal.

I would like to run in a loop (obviously not manually), a series of tests. If an independent variable is continuous, a t-test and if it's nominal, a chi-square test. From every test I need to keep the p-value (I need a variable containing p-values), so I can use the FDR method, using the STATA package smileplot which I already installed.

How do I do that ? I have no idea where to start.....if anyone ever did something like this and can help me with codes it will be more than appreciated....the alternative is to work with R....less friendly.

thanks
 

bukharin

RoboStataRaptor
#2
A t-test would be appropriate for a binary independent variable and continuous outcome, not the other way around. For a continuous predictor and binary categorical outcome I'd suggest logistic regression.

To help get you started, after logistic regression the p-value for the overall regression is returned as e(p). After -tabulate, chi2- it's returned as r(p). You can loop over variables using:
Code:
foreach var of varlist a b c { // a, b & c are your continuous independent variables
    logistic outcomevar `var'
    do something with e(p)
}

foreach var of varlist d e f { // d, e & f are your categorical independent variables
    tabulate outcomevar `var', chi2
    do something with r(p)
}
The "do something" is a bit tricky and it depends on what you're after. If you just want the p-values you could store them in a matrix. Otherwise you may need to create a temporary dataset, which is slightly irritating in this situation (but quite do-able).
 
#3
thank you for the quick reply!

I have more than 3 variables, perhaps something like 500 or more. Is there a way to tell the loop to run from var1-var500 without listing them ?

The "do something" is very tricky, I have no idea how to handle it. I do need to store the p-values somehow, I don't know if a matrix is better or dataset. I need it in order to use the false discovery rate (package smileplot) so I can estimate how many type I errors I have (if I run so many tests I will have some for sure).
 

bukharin

RoboStataRaptor
#4
Well here's a way of doing it using a temporary dataset. I've added "quietly" in front of each calculation to save time and screen real estate...
Code:
tempfile pvalues
foreach var of varlist a b c { // a, b & c are your continuous independent variables
    quietly logistic outcomevar `var'
    preserve
    clear
    set obs 1
    gen var="`var'"
    gen p=e(p)
    capture append using `pvalues'
    save `pvalues', replace
    restore
}

foreach var of varlist d e f { // d, e & f are your categorical independent variables
    quietly tabulate outcomevar `var', chi2
    preserve
    clear
    set obs 1
    gen var="`var'"
    gen p=r(p)
    capture append using `pvalues'
    save `pvalues', replace
    restore
}

use `pvalues', clear
As to extending it to run from var1-var500, it's well documented in -help foreach-; you may also want to look at -help varlist- and, just for good measure, -help numlist-