Thread: How to run regressions by industry and year in Stata

1. How to run regressions by industry and year in Stata

Hello,

I am new to Stata and want to run regressions and predict residuals by industry and year, e.g. 20 industries in 10 years so 20x10 regressions. I wrote the following code using loop:

levelsof sic, local(x)
foreach x of numlist 10/87 {
foreach z of numlist 2005/2010 {
reg cfo at s sch if sic==`x' & fyear==`z', noconstant
predict pcfo if e(sample) & sic==`x' & fyear==`z'
replace pcfo="`pcfo'"
predict rcfo if e(sample) & sic==`x' & fyear==`z', residuals
replace rcfo="`rcfo'"

}
}
end

But this code can generate predicted value for the first year only. Could anyone know what bug exists in my code.
Any help is appreciated. Thanks.

Mike

2. Re: How to run regressions by industry and year in Stata

I think there are a few problems with the code. The fundamental problem is that -predict- needs to create a new variable, so after you've run -predict- once it won't allow you to generate new predictions with the same variable name. One way around this is to create a variable for your predictions with missing values, use -predict- to place predictions in a temporary variable, then -replace- your variable with the values from the temporary variable (see code below).

Originally Posted by byuus2011
levelsof sic, local(x)
foreach x of numlist 10/87 {
This -levelsof- command creates a macro called `x' containing all values of sic; but then the -foreach- command immediately overwrites `x' with the number 10. It doesn't look like you need the -levelsof- command at all.

Originally Posted by byuus2011
predict pcfo if e(sample) & sic==`x' & fyear==`z'
You should just be able to say -if e(sample)- because the other conditions were already specified when you ran the regression. e(sample) is true if the observations were included in the regression.

Originally Posted by byuus2011
replace pcfo="`pcfo'"
I'm not sure what you're trying to achieve here. In your code pcfo is a variable name, not a macro name. Using the single quotes `pcfo' is the way to reference a local macro, and putting it in double quotes as "`pcfo'" means you're getting the value of the local macro `pcfo' and converting it to a string - surely this is not what you're trying to do.

Originally Posted by byuus2011
end
You don't need to use -end- here; you use it to end a program (see -help program-) or after manually entering data using the -input- command.

Suggested fix:
Code:
``````gen pcfo=. // empty variable for predictions
gen rcfo=. // empty variable for residuals
tempvar pcfo rcfo // temporary variables for each set of predictions
foreach x of numlist 10/87 {
foreach z of numlist 2005/2010 {
reg cfo at s sch if sic==`x' & fyear==`z', noconstant
predict `pcfo' // predictions are now in temporary variable
replace pcfo=`pcfo' if e(sample) // transfer predictions from temp variable
predict `rcfo' // residuals are now in temporary variable
replace rcfo=`rcfo' if e(sample), residuals // transfer residuals from temp variable
drop `pcfo' `rcfo' // drop temporary variables in preparation for next regression
}
}``````

3. Re: How to run regressions by industry and year in Stata

bukharin,

I run the code you wrote. But it says no observation maybe because my sic code is continual. I have sic 10 12 13 .... 78 82 85 87. Then I adjust the code to:

gen pcfo=.
gen rcfo=.
tempvar pcfo rcfo
levelsof sic, local (x)
foreach x of local x {
foreach z of numlist 2005/2010 {
.....

Now the program can be run but generates all missing values for pcfo and rcfo.

Thank you!

4. Re: How to run regressions by industry and year in Stata

Not sure why that's happened; sometimes a problem like that is due to a subtle syntax error. Can you please post a log of the code and output? Suggest changing the "reg cfo ..." to "quietly reg cfo..." so that the log isn't ridiculously long.

5. Re: How to run regressions by industry and year in Stata

I doubt if the loop really works because no output shows up in log. So I remove -levelsof- and switch back to -foreach x of numlist 10/87-. Since my sic code looks like: 12-17, 20-40, 44-59, and 67-87, I run the program a couple of times by spliting -foreach- values to something like: -foreach x of numlist 12/17-, run once, then -foreach x of numlist 20/40-, run another time. It works but seems stupid. I know there might be an efficient way in loop to handle such discrete numbers in -foreach numlist values-

Thank you!

6. Re: How to run regressions by industry and year in Stata

Originally Posted by byuus2011
levelsof sic, local (x)
foreach x of local x
Just looked at this again - you shouldn't have a space between local and (x), and you should use a new name (not x) - what you're effectively saying is "let's go through values of x and call the current value x" - there may be a problem with overwriting x.

In any case you can specify more complicated numlists than a/b, for example:
Code:
``````. foreach x of numlist 1/5 6/10 12 15/18 20 {
2. display `x'
3. }
1
2
3
4
5
6
7
8
9
10
12
15
16
17
18
20``````
See -help numlist-

7. Re: How to run regressions by industry and year in Stata

Now the program works perfectly. Thank you so much!

8. Re: How to run regressions by industry and year in Stata

Now I have another problem. The loop runs well till it encounters a year in which there are no obs or insufficient obs, then it stops. I don't know how to solve this problem. Any help is appreciated! Thanks.

9. Re: How to run regressions by industry and year in Stata

In that case you can use -capture- to prevent the loop from exiting with an error. -capture- "captures" the output, so for example if you have this error message:
no observations
r(2000);

... then -capture- prevents your loop from exiting but retains the error code (2000) in a system variable called _rc (rc=return code).

In Stata any command that completes successfully returns 0 in _rc, otherwise it returns a non-zero error code. This is useful to know because you can -capture- the code and then decide whether or not to run further code depending on whether _rc is 0 (command successfully ran) or non-zero (command didn't successfully run).

Silly example:
Code:
``````. sysuse auto, clear
(1978 Automobile Data)

. tab foreign, nolab

Car type |      Freq.     Percent        Cum.
------------+-----------------------------------
0 |         52       70.27       70.27
1 |         22       29.73      100.00
------------+-----------------------------------
Total |         74      100.00

. regress price mpg

Source |       SS       df       MS              Number of obs =      74
-------------+------------------------------           F(  1,    72) =   20.26
Model |   139449474     1   139449474           Prob > F      =  0.0000
Residual |   495615923    72  6883554.48           R-squared     =  0.2196
Total |   635065396    73  8699525.97           Root MSE      =  2623.7

------------------------------------------------------------------------------
price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
mpg |  -238.8943   53.07669    -4.50   0.000    -344.7008   -133.0879
_cons |   11253.06   1170.813     9.61   0.000     8919.088    13587.03
------------------------------------------------------------------------------

. di _rc
0

. regress price mpg if foreign==2
no observations
r(2000);

. capture regress price mpg if foreign==2

. di _rc
2000``````
So running with your example, I would implement something like this:
Code:
``````gen pcfo=. // empty variable for predictions
gen rcfo=. // empty variable for residuals
tempvar pcfo rcfo // temporary variables for each set of predictions
levelsof sic, local(levels)
foreach x of local levels {
foreach z of numlist 2005/2010 {
capture reg cfo at s sch if sic==`x' & fyear==`z', noconstant
if !_rc {
predict `pcfo' // predictions are now in temporary variable
replace pcfo=`pcfo' if e(sample) // transfer predictions from temp variable
predict `rcfo' // residuals are now in temporary variable
replace rcfo=`rcfo' if e(sample), residuals // transfer residuals from temp variable
drop `pcfo' `rcfo' // drop temporary variables in preparation for next regression
}
}
}``````
Note that you don't need to use -quietly- with -capture-, because -capture- suppresses all of the output from the command. If you still want to see the output you can use:
capture noisily command ...

10. Re: How to run regressions by industry and year in Stata

Great! Thank you so much!

Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts