select if values on multiple vars match


I am currently working on two research projects, one on intermarriage and one on international migration, and I have run into a similar problem with both: I need to construct a variable to distinguish, in the first project, between partners from the same origin country and from different origin countries, and in the second project between fathers and mothers from the same or different origin countries.

Say we pick the latter project as example and we have, for each student (id as identifier) information on origin country of mother (ISO codes, stored in birthmom) and father (birthfat). Is there a way to say 'by id: if value of birthmom=value of birthdad, newvar=1, else newvar=0? I've been thinking of a loop, specifying integer values going from 1 to 900, covering all ISO codes, and giving the command to generate newvar=1 for, by id, matching values for birthmom and birthdad for each i. Or perhaps a twoway tab of birthmom and birthdad with a generate command and then a selection on the matching categories?

I'm pretty sure a much easier solution is available though, and anyway, I lack the knowledge and lucidity to think even of the exact code for a loop. Any help would be greatly appreciated.




If I'm understanding you correctly, it's much simpler than you think:

gen newvar=birthmom==birthfat if !missing(birthmom, birthfat)
This will make newvar 1 if the expression (birthmom==birthfat) is true, 0 is it's false, or missing if one or both of birthmom & birthfat are different.

It's important not to forget the !missing(birthmom, birthfat) otherwise newvar will be 0 if you're missing the mother or father's country, and 1 if they're both missing - which is unlikely to be what you want.

This kind of command works observation by observation (ie row by row in your dataset) so as long as birthmom & birthfat are recorded within the same observation, you don't need to worry about using id.

Incidentally, shouldn't it be birthdad? ;)