+ Reply to Thread
Results 1 to 3 of 3

Thread: Stata strategy needed: which datasets have variable X?

  1. #1
    Points: 1,125, Level: 18
    Level completed: 25%, Points required for next Level: 75

    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Post Stata strategy needed: which datasets have variable X?



    Hello everyone,

    I'm working with a large number of large datasets (roughly 150), and basically I need some potential strategies for automating having Stata figure out whether or not a certain set of variables exist across these datasets.

    Basically, I want to write a .do (or .ado) file which opens each dataset, looks for a list of variables, and writes a 0 or 1 to a spreadsheet letting me know which variables exist in which datasets. It's a simple enough SOUNDING idea, but I've no clue how to actually do something like this in practice or if it's even possible.

    Any help would be greatly appreciated!

  2. #2
    RoboStataRaptor
    Points: 7,341, Level: 56
    Level completed: 96%, Points required for next Level: 9
    bukharin's Avatar
    Location
    Sydney, Australia
    Posts
    1,018
    Thanks
    9
    Thanked 241 Times in 234 Posts

    Re: Stata strategy needed: which datasets have variable X?

    One command that may be useful is:

    Code: 
    describe using filename, varlist
    This will describe the contents of "filename", and a list of the variables it contains will be contained in the return value r(varlist)

    So you could then do, for an ugly example:

    Code: 
    describe using filename, varlist
    if strpos(r(varlist), "myvar")!=0 {
    display "filename contains the variable myvar"
    }
    else {
    display "filename does not contain the variable myvar"
    }
    Here's an example from the built-in auto dataset (I changed paths to the system path just to make the example filename shorter):

    Code: 
    . describe using auto.dta, varlist
    
    Contains data                                 1978 Automobile Data
      obs:            74                          13 Apr 2009 17:45
     vars:            12                          
     size:         3,478                          
    -------------------------------------------------------------------------------
                  storage  display     value
    variable name   type   format      label      variable label
    --------------------------------------------------------------------------------------------------------------
    make            str18  %-18s                  Make and Model
    price           int    %8.0gc                 Price
    mpg             int    %8.0g                  Mileage (mpg)
    rep78           int    %8.0g                  Repair Record 1978
    headroom        float  %6.1f                  Headroom (in.)
    trunk           int    %8.0g                  Trunk space (cu. ft.)
    weight          int    %8.0gc                 Weight (lbs.)
    length          int    %8.0g                  Length (in.)
    turn            int    %8.0g                  Turn Circle (ft.)
    displacement    int    %8.0g                  Displacement (cu. in.)
    gear_ratio      float  %6.2f                  Gear Ratio
    foreign         byte   %8.0g       origin     Car type
    -------------------------------------------------------------------------------
    Sorted by:  foreign  
    
    . if strpos(r(varlist), "price")!=0 {
    .         display "auto.dta contains the variable price"
    auto.dta contains the variable price
    . }
    
    . else {
    .         display "auto.dta does not contain the variable price"
    . }
    
    . if strpos(r(varlist), "fakevar")!=0 {
    .         display "auto.dta contains the variable fakevar"
    . }
    
    . else {
    .         display "auto.dta does not contain the variable fakevar"
    auto.dta does not contain the variable fakevar
    . }
    
    .
    Of course the examples are ugly but they're just to show how to use -describe- and r(varlist) to determine if a variable's present. You could probably combine the gist of the above code with a list of file names and some binary indicators...

    Good luck!

  3. #3
    RoboStataRaptor
    Points: 7,341, Level: 56
    Level completed: 96%, Points required for next Level: 9
    bukharin's Avatar
    Location
    Sydney, Australia
    Posts
    1,018
    Thanks
    9
    Thanked 241 Times in 234 Posts

    Re: Stata strategy needed: which datasets have variable X?


    Actually it looks like you just want a list of variables for each dataset. That's easier. Just create a Stata dataset with a string variable called "filename" containing, in each row, a file name.

    Then run the following code:

    Code: 
    gen str vars=""
    levelsof filename, local(datafiles)
    foreach fname of local datafiles {
    quietly describe using `fname', varlist
    quietly replace vars=r(varlist) if filename=="`fname'"
    }

+ Reply to Thread

Similar Threads

  1. Generating a new variable in STATA
    By Infernape in forum Stata
    Replies: 3
    Last Post: 12-18-2010, 04:57 PM
  2. log variable on stata
    By momo in forum Stata
    Replies: 1
    Last Post: 10-02-2010, 08:15 AM
  3. Replies: 1
    Last Post: 08-11-2010, 09:59 AM
  4. Stata Help: Indicator Variable Problem
    By bowser in forum Stata
    Replies: 1
    Last Post: 02-14-2010, 10:29 AM
  5. [STATA] Lagged variable
    By GuiGui in forum Stata
    Replies: 6
    Last Post: 02-03-2010, 01:43 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts








Advertise on Talk Stats