Generating new variables

DniC

New Member
#1
Hi all,

I'm doing an empirical study for my senior paper and I'm running into a few roadblocks on Stata, the main one being generating a new list that draws from different variables.

My data is a set of panel data, coded by the individual id's. I've merged together the income of a few years into the same file. So right now, individual i has has 1-10 years of income (under income1, income2, income3...etc.)

I've created a new variable where each individual is given a year (1-10). My question is, is there a command in Stata to draw automatically from different sets of lists for different years?

Example: Say individual x is given year 1 and individual y is given year 2, then would it be possible to create a new variable under which x is shown with his income from year 1 (output from income1) and y with income from year 2 (output from income2)?

Sorry for the lengthy (and convoluted) post. Any help is very much appreciated!
 

bukharin

RoboStataRaptor
#2
Generally for that kind of data you're better off using a long rather than a wide data structure, eg:

Code:
id  incomeyear  income  year
1   1           123     3
1   2           456     3
1   3           789     3
...
2   1           111     1
2   2           222     1
...
You can convert between a wide and long data structure using -reshape-, eg:

Code:
reshape long income, i(id) j(incomeyear)
After that you can match year (the year you want to sample) with the new variable, incomeyear:
Code:
gen selectedincome=income if incomeyear==year

id  incomeyear  income  year  selectedincome
1   1           123     3     .
1   2           456     3     .
1   3           789     3     789
...
2   1           111     1     111
2   2           222     1     .
...
You may just want to keep the relevant observation for each individual:
Code:
keep if !missing(selectedincome)

id  incomeyear  income  year  selectedincome
1   3           789     3     789
...
2   1           111     1     111
...
Or perhaps you want to make each observation per individual have the same selectedincome:
Code:
sort id selectedincome
by id: replace selectedincome=selectedincome[1]

id  incomeyear  income  year  selectedincome
1   3           789     3     789
1   1           123     3     789
1   2           456     3     789
...
2   1           111     1     111
2   2           222     1     111
...
After that last step you could always -reshape- back to your original wide format. But generally it's easier to work with a long format.
 

DniC

New Member
#3
That looks like it should do it. Thanks!

Would it be possible to do the same thing in a different combination of years for the individuals on the same set? (i.e. now individual 1 needs income from year 10 and individual 2 from year 3) Or should I make duplicates of the data and do the alternative combination separately?
 

bukharin

RoboStataRaptor
#4
Well instead of year you could have year1, year2 etc +/- make the dataset longer (like the example I posted but with one copy for each sample you're drawing).

There are several different ways to do it, and Stata is very fast at data manipulation so you don't necessarily have to find an optimal algorithm so much as something that you understand and are comfortable with.

The User's Guide has a lot of good tips on data manipulation.
 
#6
Hi,

I am new to this forum and SAS as well. Can any one help me out how to implement the logic of first dot (first.) of SAS into PL/1. Can we implement this logic of first dot by DB2 (from any SQL query ). Please help me out in this.

Thanks