Merging Data

Yida

New Member
#1
I have a data set that is in this format:
year data1 data2 ...
01 .56 ...
01 .34
01 .76
... ...
02
02
...
03
....
09


And I have some more data set that is much smaller:

_year _data1 ...
01 1222
02 2634
03 ....
04
05
06
07
08
09

What is a good way to merge the two data sets so that all the data for each "_year" is matched up with the corresponding "year" in the first data set? The catch is that I want a lag in the years such that "_year=05" matches to "year=06".

Thanks for the help!
 

bukharin

RoboStataRaptor
#2
Just create a new variable in one of the datasets with the lagged year, then use -merge-

eg in the second dataset:
gen year=_year+1
merge year ...

(see -help merge-)

I would advise against naming a variable with an underscore prefix in Stata. From the user's guide:
"The first character of a name must be a letter or an underscore. We recommend, however, that
you not begin your variable names with an underscore. All of Stata’s built-in variables begin with an
underscore, and we reserve the right to incorporate new variables freely."
 

Yida

New Member
#3
I did a (merge m:1 yr using 2nddata_set), and there seems to be a correct merge. However, the other variables were not matched up. The other variables besides yr in the 2nddata_set are all dots (.) for almost all of the entries.
 

bukharin

RoboStataRaptor
#4
It works here:
Code:
* input the "using" dataset
clear
input _year _data1
01 1222
02 2634
03 .
04 .
05 .
06 .
07 .
08 .
09 .
end

* we'll match on a variable called yr
* incorporating the lagged year by adding 1
gen yr=_year+1

* save the "using" dataset to a temporary file
tempfile using
save `using'

* now input the "master" dataset
clear
input year data1 data2
01 .56 .
01 .34 .
01 .76 .
02 . .
02 . .
03 . .
09 . .
end

* re-create the yr variable
gen yr=year

* now merge in the "using" dataset
merge m:1 yr using `using'

list



     +--------------------------------------------------------------+
     | year   data1   data2   yr   _year   _data1            _merge |
     |--------------------------------------------------------------|
  1. |    1     .56       .    1       .        .   master only (1) |
  2. |    1     .34       .    1       .        .   master only (1) |
  3. |    1     .76       .    1       .        .   master only (1) |
  4. |    2       .       .    2       1     1222       matched (3) |
  5. |    2       .       .    2       1     1222       matched (3) |
     |--------------------------------------------------------------|
  6. |    3       .       .    3       2     2634       matched (3) |
  7. |    9       .       .    9       8        .       matched (3) |
  8. |    .       .       .    4       3        .    using only (2) |
  9. |    .       .       .    5       4        .    using only (2) |
 10. |    .       .       .    6       5        .    using only (2) |
     |--------------------------------------------------------------|
 11. |    .       .       .    7       6        .    using only (2) |
 12. |    .       .       .    8       7        .    using only (2) |
 13. |    .       .       .   10       9        .    using only (2) |
     +--------------------------------------------------------------+
The _merge variable is quite useful for troubleshooting merging problems. You'll need to provide more info (show actual data, actual commands used and actual results) if you need us to help troubleshoot the merge...
 

Yida

New Member
#5
I've resolved the issue!
The problem was that the yr column in my 2nddataset was the last column. In Stata, it seems that only variables to the right of the merging variable will be included in the merged dataset. Thus, I just made duplicate copies of all of the data columns and placed them after the yr column in the 2nddataset.
 

bukharin

RoboStataRaptor
#6
Glad you've got it sorted, but I doubt that the problem is with the location of variables in the dataset. Note that in my example above yr is the last variable in each dataset yet the merge works fine. (Perhaps if you have an outdated version of Stata... but I suspect the problem is more mundane, eg a typo or something). In any case, glad it works :)