Hi guys, hope you are doing good!

I have quite the conundrum; I'm trying to figure out different ways I can cross-examine two datasets (D1 and D2), to determine whether they are proper in terms of anonymity. Thus, the goal would be to not being able to recognize pieces of D1 in D2 and the other way around, by trying as hard as one can to do so. I'm editing the datasets so that scientists can use it for research, and thus have be sure that respondents cannot be identified in D1 and D2 if a researcher gets both.

Both datasets are survey-data, with 7121 observations. D1 has 433 variables and D2 has 510, in which ca 350 are identical variables across D1 and D2. D1 contains county-level data, and D2 contains state-level data. A lot of variables are demographic in nature such as age, profession, level of education, number of children in the household, diseases etc., i.e. a lot of variables that can be identifying for the respondent (teacher X in county A with seven kids under 16) although formal identifiers such as ID-number and town/city has been removed.

I've tried some merging, but it seems that it can't work as neither of the datasets have unique identifiers. Can I do some scrambling of the observations which sorting on a variable won't reorder in the same way? I'm using SAS, but tips for R and Stata can work as well.

Thanks!!
/J