View Full Version : 2 variables with different number of observations
zerostin
04-14-2011, 11:45 AM
Hello,
I want to create a new variable which gets the value "1" if var1 is the same as var2. However, var1 and var2 have different number of observations. So, this is causing some problems.
var1 has more than 2.3 million observations.
var2 has approx. 5000 observations.
First I created a new variable --> tag_2001 // gen tag_year = 0
replace tag_2001 = 1 if var1[_n-1] == var2
If I do this, he changes only 1 observations, which is not possible. There are more observations which are the same.
Maybe somebody has some suggestions how I could fix this.
duskstar
04-15-2011, 03:01 AM
I'm not sure it will help here but have you looked at "duplicates tag"
zerostin
04-15-2011, 07:00 AM
I looked at it, (also installed dups (N. Cox)) but if I delete some duplicates he deletes the whole observation for all the variables, not just for that one variable. And that's not what I want :)
This is what I want actually:
var1 var2 var3
1 1 1
1 4 1
1 6 1
4 1
4 1
5 0
5 0
6 1
6 1
6 1
6 1
6 1
Thus, if var1 matches var2 he should put a 1 in var3 for all the possibilties. Later on I can use the tag command to fix another problem ;)
duskstar
04-15-2011, 10:07 AM
Sorry I'm a bit confused about what you want. Do you want a number 1 if a value EVER appears in both var1 and var2?
Your current code just looks at the previous row I think, rather than all the rows. Did you want it to look at all the rows?
Sorry, I'm a bit confused about what your trying to do!
zerostin
04-15-2011, 12:25 PM
I will try to explain it.
I have a variable which shows the top quantile (based on performance) in a given year e.g. 2000. I created var1 with this variable.
var1: shows ONLY accountIDs of the top quantile in year 2000 (5000 values)
var2: shows ALL accountIDs of all months for 6 years (2.3 million values).
So, if accountID 3 is trading in january 2001 and february 2004 it is two times in var2. If it trading all the month for 6 years, it's 72 times in var2.
What I would like to do is: if var1 occurs in var2, var3 should get a 1. However, this can happen many times.
I don't know what the command is, that stata checks all the variables of var2 and looks if there is a match with var1. If there is a match -> "1", if there is no match -> "0".
I hope it is more clear.
zerostin
04-26-2011, 09:02 AM
Solution:
Create new dataset with var1. Rename var1 to var2. Merge this file with your master.dta
Powered by vBulletin™ Version 4.1.3 Copyright © 2013 vBulletin Solutions, Inc. All rights reserved.