+ Reply to Thread
Results 1 to 10 of 10

Thread: How to run regressions by industry and year in Stata

  1. #1
    Points: 491, Level: 9
    Level completed: 82%, Points required for next Level: 9

    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts

    How to run regressions by industry and year in Stata



    Hello,

    I am new to Stata and want to run regressions and predict residuals by industry and year, e.g. 20 industries in 10 years so 20x10 regressions. I wrote the following code using loop:

    levelsof sic, local(x)
    foreach x of numlist 10/87 {
    foreach z of numlist 2005/2010 {
    reg cfo at s sch if sic==`x' & fyear==`z', noconstant
    predict pcfo if e(sample) & sic==`x' & fyear==`z'
    replace pcfo="`pcfo'"
    predict rcfo if e(sample) & sic==`x' & fyear==`z', residuals
    replace rcfo="`rcfo'"

    }
    }
    end

    But this code can generate predicted value for the first year only. Could anyone know what bug exists in my code.
    Any help is appreciated. Thanks.

    Mike

  2. #2
    RoboStataRaptor
    Points: 7,301, Level: 56
    Level completed: 76%, Points required for next Level: 49
    bukharin's Avatar
    Location
    Sydney, Australia
    Posts
    1,015
    Thanks
    9
    Thanked 240 Times in 233 Posts

    Re: How to run regressions by industry and year in Stata

    I think there are a few problems with the code. The fundamental problem is that -predict- needs to create a new variable, so after you've run -predict- once it won't allow you to generate new predictions with the same variable name. One way around this is to create a variable for your predictions with missing values, use -predict- to place predictions in a temporary variable, then -replace- your variable with the values from the temporary variable (see code below).

    A couple of other comments about the code:

    Quote Originally Posted by byuus2011 View Post
    levelsof sic, local(x)
    foreach x of numlist 10/87 {
    This -levelsof- command creates a macro called `x' containing all values of sic; but then the -foreach- command immediately overwrites `x' with the number 10. It doesn't look like you need the -levelsof- command at all.

    Quote Originally Posted by byuus2011 View Post
    predict pcfo if e(sample) & sic==`x' & fyear==`z'
    You should just be able to say -if e(sample)- because the other conditions were already specified when you ran the regression. e(sample) is true if the observations were included in the regression.

    Quote Originally Posted by byuus2011 View Post
    replace pcfo="`pcfo'"
    I'm not sure what you're trying to achieve here. In your code pcfo is a variable name, not a macro name. Using the single quotes `pcfo' is the way to reference a local macro, and putting it in double quotes as "`pcfo'" means you're getting the value of the local macro `pcfo' and converting it to a string - surely this is not what you're trying to do.

    Quote Originally Posted by byuus2011 View Post
    end
    You don't need to use -end- here; you use it to end a program (see -help program-) or after manually entering data using the -input- command.

    Suggested fix:
    Code: 
    gen pcfo=. // empty variable for predictions
    gen rcfo=. // empty variable for residuals
    tempvar pcfo rcfo // temporary variables for each set of predictions
    foreach x of numlist 10/87 {
       foreach z of numlist 2005/2010 {
          reg cfo at s sch if sic==`x' & fyear==`z', noconstant
          predict `pcfo' // predictions are now in temporary variable
          replace pcfo=`pcfo' if e(sample) // transfer predictions from temp variable
          predict `rcfo' // residuals are now in temporary variable
          replace rcfo=`rcfo' if e(sample), residuals // transfer residuals from temp variable
          drop `pcfo' `rcfo' // drop temporary variables in preparation for next regression
       }
    }

  3. #3
    Points: 491, Level: 9
    Level completed: 82%, Points required for next Level: 9

    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: How to run regressions by industry and year in Stata

    bukharin,

    Thank you very much for your quick reply and help!
    I run the code you wrote. But it says no observation maybe because my sic code is continual. I have sic 10 12 13 .... 78 82 85 87. Then I adjust the code to:

    gen pcfo=.
    gen rcfo=.
    tempvar pcfo rcfo
    levelsof sic, local (x)
    foreach x of local x {
    foreach z of numlist 2005/2010 {
    .....

    Now the program can be run but generates all missing values for pcfo and rcfo.
    Would you help please?

    Thank you!

  4. #4
    RoboStataRaptor
    Points: 7,301, Level: 56
    Level completed: 76%, Points required for next Level: 49
    bukharin's Avatar
    Location
    Sydney, Australia
    Posts
    1,015
    Thanks
    9
    Thanked 240 Times in 233 Posts

    Re: How to run regressions by industry and year in Stata

    Not sure why that's happened; sometimes a problem like that is due to a subtle syntax error. Can you please post a log of the code and output? Suggest changing the "reg cfo ..." to "quietly reg cfo..." so that the log isn't ridiculously long.

  5. #5
    Points: 491, Level: 9
    Level completed: 82%, Points required for next Level: 9

    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: How to run regressions by industry and year in Stata

    I doubt if the loop really works because no output shows up in log. So I remove -levelsof- and switch back to -foreach x of numlist 10/87-. Since my sic code looks like: 12-17, 20-40, 44-59, and 67-87, I run the program a couple of times by spliting -foreach- values to something like: -foreach x of numlist 12/17-, run once, then -foreach x of numlist 20/40-, run another time. It works but seems stupid. I know there might be an efficient way in loop to handle such discrete numbers in -foreach numlist values-

    Thank you!

  6. #6
    RoboStataRaptor
    Points: 7,301, Level: 56
    Level completed: 76%, Points required for next Level: 49
    bukharin's Avatar
    Location
    Sydney, Australia
    Posts
    1,015
    Thanks
    9
    Thanked 240 Times in 233 Posts

    Re: How to run regressions by industry and year in Stata

    Quote Originally Posted by byuus2011 View Post
    levelsof sic, local (x)
    foreach x of local x
    Just looked at this again - you shouldn't have a space between local and (x), and you should use a new name (not x) - what you're effectively saying is "let's go through values of x and call the current value x" - there may be a problem with overwriting x.

    In any case you can specify more complicated numlists than a/b, for example:
    Code: 
    . foreach x of numlist 1/5 6/10 12 15/18 20 {
      2. display `x'
      3. }
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    12
    15
    16
    17
    18
    20
    See -help numlist-

  7. #7
    Points: 491, Level: 9
    Level completed: 82%, Points required for next Level: 9

    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: How to run regressions by industry and year in Stata

    Now the program works perfectly. Thank you so much!

  8. #8
    Points: 491, Level: 9
    Level completed: 82%, Points required for next Level: 9

    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: How to run regressions by industry and year in Stata

    Now I have another problem. The loop runs well till it encounters a year in which there are no obs or insufficient obs, then it stops. I don't know how to solve this problem. Any help is appreciated! Thanks.

  9. #9
    RoboStataRaptor
    Points: 7,301, Level: 56
    Level completed: 76%, Points required for next Level: 49
    bukharin's Avatar
    Location
    Sydney, Australia
    Posts
    1,015
    Thanks
    9
    Thanked 240 Times in 233 Posts

    Re: How to run regressions by industry and year in Stata

    In that case you can use -capture- to prevent the loop from exiting with an error. -capture- "captures" the output, so for example if you have this error message:
    no observations
    r(2000);

    ... then -capture- prevents your loop from exiting but retains the error code (2000) in a system variable called _rc (rc=return code).

    In Stata any command that completes successfully returns 0 in _rc, otherwise it returns a non-zero error code. This is useful to know because you can -capture- the code and then decide whether or not to run further code depending on whether _rc is 0 (command successfully ran) or non-zero (command didn't successfully run).

    Silly example:
    Code: 
    . sysuse auto, clear
    (1978 Automobile Data)
    
    . tab foreign, nolab
    
       Car type |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              0 |         52       70.27       70.27
              1 |         22       29.73      100.00
    ------------+-----------------------------------
          Total |         74      100.00
    
    . regress price mpg
    
          Source |       SS       df       MS              Number of obs =      74
    -------------+------------------------------           F(  1,    72) =   20.26
           Model |   139449474     1   139449474           Prob > F      =  0.0000
        Residual |   495615923    72  6883554.48           R-squared     =  0.2196
    -------------+------------------------------           Adj R-squared =  0.2087
           Total |   635065396    73  8699525.97           Root MSE      =  2623.7
    
    ------------------------------------------------------------------------------
           price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             mpg |  -238.8943   53.07669    -4.50   0.000    -344.7008   -133.0879
           _cons |   11253.06   1170.813     9.61   0.000     8919.088    13587.03
    ------------------------------------------------------------------------------
    
    . di _rc
    0
    
    . regress price mpg if foreign==2
    no observations
    r(2000);
    
    . capture regress price mpg if foreign==2
    
    . di _rc
    2000
    So running with your example, I would implement something like this:
    Code: 
    gen pcfo=. // empty variable for predictions
    gen rcfo=. // empty variable for residuals
    tempvar pcfo rcfo // temporary variables for each set of predictions
    levelsof sic, local(levels)
    foreach x of local levels {
       foreach z of numlist 2005/2010 {
          capture reg cfo at s sch if sic==`x' & fyear==`z', noconstant
          if !_rc {
             predict `pcfo' // predictions are now in temporary variable
             replace pcfo=`pcfo' if e(sample) // transfer predictions from temp variable
             predict `rcfo' // residuals are now in temporary variable
             replace rcfo=`rcfo' if e(sample), residuals // transfer residuals from temp variable
             drop `pcfo' `rcfo' // drop temporary variables in preparation for next regression
          }
       }
    }
    Note that you don't need to use -quietly- with -capture-, because -capture- suppresses all of the output from the command. If you still want to see the output you can use:
    capture noisily command ...

  10. #10
    Points: 491, Level: 9
    Level completed: 82%, Points required for next Level: 9

    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: How to run regressions by industry and year in Stata


    Great! Thank you so much!

+ Reply to Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts








Advertise on Talk Stats