# Thread: Looping over all variables

1. ## Looping over all variables

Hello,

I am using STATA for my analysis, but I am not a good programmer in STATA yet.

I am dealing with a multiple hypothesis testing problem.

I have a dataset of let's say n observations, and (p+1) variables, 1 dependent and p independent (n<<p). The dependent variable is nominal.

I would like to run in a loop (obviously not manually), a series of tests. If an independent variable is continuous, a t-test and if it's nominal, a chi-square test. From every test I need to keep the p-value (I need a variable containing p-values), so I can use the FDR method, using the STATA package smileplot which I already installed.

How do I do that ? I have no idea where to start.....if anyone ever did something like this and can help me with codes it will be more than appreciated....the alternative is to work with R....less friendly.

thanks

2. ## Re: Looping over all variables

A t-test would be appropriate for a binary independent variable and continuous outcome, not the other way around. For a continuous predictor and binary categorical outcome I'd suggest logistic regression.

To help get you started, after logistic regression the p-value for the overall regression is returned as e(p). After -tabulate, chi2- it's returned as r(p). You can loop over variables using:
Code:
``````foreach var of varlist a b c { // a, b & c are your continuous independent variables
logistic outcomevar `var'
do something with e(p)
}

foreach var of varlist d e f { // d, e & f are your categorical independent variables
tabulate outcomevar `var', chi2
do something with r(p)
}``````
The "do something" is a bit tricky and it depends on what you're after. If you just want the p-values you could store them in a matrix. Otherwise you may need to create a temporary dataset, which is slightly irritating in this situation (but quite do-able).

3. ## Re: Looping over all variables

thank you for the quick reply!

I have more than 3 variables, perhaps something like 500 or more. Is there a way to tell the loop to run from var1-var500 without listing them ?

The "do something" is very tricky, I have no idea how to handle it. I do need to store the p-values somehow, I don't know if a matrix is better or dataset. I need it in order to use the false discovery rate (package smileplot) so I can estimate how many type I errors I have (if I run so many tests I will have some for sure).

4. ## Re: Looping over all variables

Well here's a way of doing it using a temporary dataset. I've added "quietly" in front of each calculation to save time and screen real estate...
Code:
``````tempfile pvalues
foreach var of varlist a b c { // a, b & c are your continuous independent variables
quietly logistic outcomevar `var'
preserve
clear
set obs 1
gen var="`var'"
gen p=e(p)
capture append using `pvalues'
save `pvalues', replace
restore
}

foreach var of varlist d e f { // d, e & f are your categorical independent variables
quietly tabulate outcomevar `var', chi2
preserve
clear
set obs 1
gen var="`var'"
gen p=r(p)
capture append using `pvalues'
save `pvalues', replace
restore
}

use `pvalues', clear``````
As to extending it to run from var1-var500, it's well documented in -help foreach-; you may also want to look at -help varlist- and, just for good measure, -help numlist-

5. ## The Following User Says Thank You to bukharin For This Useful Post:

NN_STAT (11-24-2011)

 Tweet

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts