PDA

View Full Version : How to run regressions by industry and year in Stata

byuus2011
12-14-2011, 04:02 PM
Hello,

I am new to Stata and want to run regressions and predict residuals by industry and year, e.g. 20 industries in 10 years so 20x10 regressions. I wrote the following code using loop:

levelsof sic, local(x)
foreach x of numlist 10/87 {
foreach z of numlist 2005/2010 {
reg cfo at s sch if sic==`x' & fyear==`z', noconstant
predict pcfo if e(sample) & sic==`x' & fyear==`z'
replace pcfo="`pcfo'"
predict rcfo if e(sample) & sic==`x' & fyear==`z', residuals
replace rcfo="`rcfo'"

}
}
end

But this code can generate predicted value for the first year only. Could anyone know what bug exists in my code.
Any help is appreciated. Thanks.

Mike

bukharin
12-14-2011, 05:31 PM
I think there are a few problems with the code. The fundamental problem is that -predict- needs to create a new variable, so after you've run -predict- once it won't allow you to generate new predictions with the same variable name. One way around this is to create a variable for your predictions with missing values, use -predict- to place predictions in a temporary variable, then -replace- your variable with the values from the temporary variable (see code below).

levelsof sic, local(x)
foreach x of numlist 10/87 {

This -levelsof- command creates a macro called `x' containing all values of sic; but then the -foreach- command immediately overwrites `x' with the number 10. It doesn't look like you need the -levelsof- command at all.

predict pcfo if e(sample) & sic==`x' & fyear==`z'

You should just be able to say -if e(sample)- because the other conditions were already specified when you ran the regression. e(sample) is true if the observations were included in the regression.

replace pcfo="`pcfo'"

I'm not sure what you're trying to achieve here. In your code pcfo is a variable name, not a macro name. Using the single quotes `pcfo' is the way to reference a local macro, and putting it in double quotes as "`pcfo'" means you're getting the value of the local macro `pcfo' and converting it to a string - surely this is not what you're trying to do.

end

You don't need to use -end- here; you use it to end a program (see -help program-) or after manually entering data using the -input- command.

Suggested fix:
gen pcfo=. // empty variable for predictions
gen rcfo=. // empty variable for residuals
tempvar pcfo rcfo // temporary variables for each set of predictions
foreach x of numlist 10/87 {
foreach z of numlist 2005/2010 {
reg cfo at s sch if sic==`x' & fyear==`z', noconstant
predict `pcfo' // predictions are now in temporary variable
replace pcfo=`pcfo' if e(sample) // transfer predictions from temp variable
predict `rcfo' // residuals are now in temporary variable
replace rcfo=`rcfo' if e(sample), residuals // transfer residuals from temp variable
drop `pcfo' `rcfo' // drop temporary variables in preparation for next regression
}
}

byuus2011
12-14-2011, 08:10 PM
bukharin,

I run the code you wrote. But it says no observation maybe because my sic code is continual. I have sic 10 12 13 .... 78 82 85 87. Then I adjust the code to:

gen pcfo=.
gen rcfo=.
tempvar pcfo rcfo
levelsof sic, local (x)
foreach x of local x {
foreach z of numlist 2005/2010 {
.....

Now the program can be run but generates all missing values for pcfo and rcfo.

Thank you!

bukharin
12-14-2011, 10:02 PM
Not sure why that's happened; sometimes a problem like that is due to a subtle syntax error. Can you please post a log of the code and output? Suggest changing the "reg cfo ..." to "quietly reg cfo..." so that the log isn't ridiculously long.

byuus2011
12-14-2011, 10:47 PM
I doubt if the loop really works because no output shows up in log. So I remove -levelsof- and switch back to -foreach x of numlist 10/87-. Since my sic code looks like: 12-17, 20-40, 44-59, and 67-87, I run the program a couple of times by spliting -foreach- values to something like: -foreach x of numlist 12/17-, run once, then -foreach x of numlist 20/40-, run another time. It works but seems stupid. I know there might be an efficient way in loop to handle such discrete numbers in -foreach numlist values-

Thank you!

bukharin
12-14-2011, 10:56 PM
levelsof sic, local (x)
foreach x of local x

Just looked at this again - you shouldn't have a space between local and (x), and you should use a new name (not x) - what you're effectively saying is "let's go through values of x and call the current value x" - there may be a problem with overwriting x.

In any case you can specify more complicated numlists than a/b, for example:
. foreach x of numlist 1/5 6/10 12 15/18 20 {
2. display `x'
3. }
1
2
3
4
5
6
7
8
9
10
12
15
16
17
18
20

See -help numlist-

byuus2011
12-15-2011, 09:27 PM
Now the program works perfectly. Thank you so much!

byuus2011
12-16-2011, 05:18 PM
Now I have another problem. The loop runs well till it encounters a year in which there are no obs or insufficient obs, then it stops. I don't know how to solve this problem. Any help is appreciated! Thanks.

bukharin
12-17-2011, 07:03 AM
In that case you can use -capture- to prevent the loop from exiting with an error. -capture- "captures" the output, so for example if you have this error message:
no observations
r(2000);

... then -capture- prevents your loop from exiting but retains the error code (2000) in a system variable called _rc (rc=return code).

In Stata any command that completes successfully returns 0 in _rc, otherwise it returns a non-zero error code. This is useful to know because you can -capture- the code and then decide whether or not to run further code depending on whether _rc is 0 (command successfully ran) or non-zero (command didn't successfully run).

Silly example:

. sysuse auto, clear
(1978 Automobile Data)

. tab foreign, nolab

Car type | Freq. Percent Cum.
------------+-----------------------------------
0 | 52 70.27 70.27
1 | 22 29.73 100.00
------------+-----------------------------------
Total | 74 100.00

. regress price mpg

Source | SS df MS Number of obs = 74
-------------+------------------------------ F( 1, 72) = 20.26
Model | 139449474 1 139449474 Prob > F = 0.0000
Residual | 495615923 72 6883554.48 R-squared = 0.2196
Total | 635065396 73 8699525.97 Root MSE = 2623.7

------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
mpg | -238.8943 53.07669 -4.50 0.000 -344.7008 -133.0879
_cons | 11253.06 1170.813 9.61 0.000 8919.088 13587.03
------------------------------------------------------------------------------

. di _rc
0

. regress price mpg if foreign==2
no observations
r(2000);

. capture regress price mpg if foreign==2

. di _rc
2000

So running with your example, I would implement something like this:
gen pcfo=. // empty variable for predictions
gen rcfo=. // empty variable for residuals
tempvar pcfo rcfo // temporary variables for each set of predictions
levelsof sic, local(levels)
foreach x of local levels {
foreach z of numlist 2005/2010 {
capture reg cfo at s sch if sic==`x' & fyear==`z', noconstant
if !_rc {
predict `pcfo' // predictions are now in temporary variable
replace pcfo=`pcfo' if e(sample) // transfer predictions from temp variable
predict `rcfo' // residuals are now in temporary variable
replace rcfo=`rcfo' if e(sample), residuals // transfer residuals from temp variable
drop `pcfo' `rcfo' // drop temporary variables in preparation for next regression
}
}
}

Note that you don't need to use -quietly- with -capture-, because -capture- suppresses all of the output from the command. If you still want to see the output you can use:
capture noisily command ...

byuus2011
12-17-2011, 10:16 AM
Great! Thank you so much!