# Detecting duplicates for the first two col and selecting one according to a third col

#### Oritteropus

##### New Member
I have a database with only three columns but with many thousands of rows. The first and the second columns report numerical ID, and their combination indicate a link (e.g. A-B equal to B-A). Now, I need to delete all rows that are duplicates for the link, selecting the row with the highest value in the third column.

Below a short example:

My database:

V1 V2 V3
1 2 100
102 100 20000
100 102 23131
10 19 124444
10 15 1244
19 10 1242
10 19 5635
2 1 666
1 2 33
100 110 23

what I aim to:

V1 V2 V3
100 102 23131
10 19 124444
10 15 1244
2 1 666
100 110 23

Thanks a lot!

#### Oritteropus

##### New Member
Re: Detecting duplicates for the first two col and selecting one according to a third

SOLVED with data.table package

#### trinker

##### ggplot2orBust
Re: Detecting duplicates for the first two col and selecting one according to a third

Two things:

2. When you're posting code, dataframes or computer output it's helpful to wrap this information in code tags by:
1. either clicking the pound (#) sign icon or
2. wrap with [NOPARSE]
Code:
some code
[/NOPARSE]

which produces:
Code:
some code

#### Oritteropus

##### New Member
Re: Detecting duplicates for the first two col and selecting one according to a third

Ok.... I didn't know I couldn't post links to other websites... Here is the code:
Code:
# assuming your data.frame is DF
require(data.table)
DT <- data.table(DF)
# get min of V1,V2 on one column and max on other (for grouping)
DT[, :=(id1=pmin(V1, V2), id2=pmax(V1, V2))]
# get max of V3
DT.OUT <- DT[, .SD[which.max(V3), ], by=list(id1, id2)]
# remove the id1 and id2 columns
DT.OUT[, c("id1", "id2") := NULL]

#     V1  V2     V3
# 1:   2   1    666
# 2: 100 102  23131
# 3:  10  19 124444
# 4:  10  15   1244
# 5: 100 110     23