Keep duplicates with the highest value

#1
Dear all,

I have a dataset with a lot of duplicates.

I'm using this commands:

sort name age sex
quietly by name age sex: gen dup = cond(_N==1,0,_n)
then my dataset looks like that for example:



And i would like to keep only the duplicates with the highest value for the variable dup.

Hence i would like to have this result:

make price mpg dup
1. Audi 5000 9690 17 0
4. BMW 320i 9375 25 3
5. Datsun 510 5079 24 0
7. VW Diesel 5397 41 2
 
Last edited:

maartenbuis

TS Contributor
#2
Code:
sort name age sex
quietly by name age sex: gen touse = _n == _N
After that you can do everything with if touse or type keep if touse