# Column of values of unique subgroup counts...

#### smithosaurus

##### New Member
Hi,
So I have a data frame, and I would like to add an additional column which shows the total sum of rows within the subgroups...

heres a simple example

Code:
grp=c("A","A","A","A","A","B","B","B","B","C","C")
hour=c(1,1,1,2,2,1,1,1,2,1,1)

test=data.frame(grp,hour,score)
grp hour
1 A 1
2 A 1
3 A 1
4 A 2
5 A 2
6 B 1
7 B 1
8 B 1
9 B 2
10 C 1
11 C 1

What I want is an additional row showing the number of rows with the respective unique values. Some like this...

grp hour #unique
1 A 1 3
2 A 1 3
3 A 1 3
4 A 2 2
5 A 2 2
6 B 1 3
7 B 1 3
8 B 1 3
9 B 2 1
10 C 1 2
11 C 1 2

Any help would be appreciated. Thanks!

#### Dason

I don't think you meant to put score in there?

#### Jake

Code:
key <- table(apply(test, 1, paste, collapse=""))
test$numUnique <- apply(test, 1, function(x) key[paste(x, collapse="")]) test # grp hour numUnique # 1 A 1 3 # 2 A 1 3 # 3 A 1 3 # 4 A 2 2 # 5 A 2 2 # 6 B 1 3 # 7 B 1 3 # 8 B 1 3 # 9 B 2 1 # 10 C 1 2 # 11 C 1 2 #### bryangoodrich ##### Probably A Mammal Code: x = data.frame(scan(what = list(rowid = 0, grp = "", hour = 0))) 1 A 1 2 A 1 3 A 1 4 A 2 5 A 2 6 B 1 7 B 1 8 B 1 9 B 2 10 C 1 11 C 1 merge(x, do.call("rbind.data.frame", by(x, list(x$grp, x$hour), function(xx) cbind(xx$rowid, nrow(xx)))), by.y = "V1", by.x = "rowid")
#    rowid grp hour V2
# 1      1   A    1  3
# 2      2   A    1  3
# 3      3   A    1  3
# 4      4   A    2  2
# 5      5   A    2  2
# 6      6   B    1  3
# 7      7   B    1  3
# 8      8   B    1  3
# 9      9   B    2  1
# 10    10   C    1  2
# 11    11   C    1  2
Ugly? You're **** right.

#### Dason

Code:
grp=c("A","A","A","A","A","B","B","B","B","C","C")
hour=c(1,1,1,2,2,1,1,1,2,1,1)
test=data.frame(grp,hour)

key <- apply(test, 1, paste, collapse = "")
tmp <- rle(key)$lengths test$unique <- rep(tmp, tmp)
which results in:
Code:
> test
grp hour unique
1    A    1      3
2    A    1      3
3    A    1      3
4    A    2      2
5    A    2      2
6    B    1      3
7    B    1      3
8    B    1      3
9    B    2      1
10   C    1      2
11   C    1      2

#### Jake

You suckers are slow.

Edit: Level up!

#### bryangoodrich

##### Probably A Mammal
I knew Dason would whip out rle for this. I still don't understand wtf it does lol

#### Dason

rle finds the run length encoding of a vector. So basically it takes in a vector and returns 2 vectors.

For example consider 1 1 2 2 2 3 1 8 8 8. We could encode that by saying "We have 2 values of 1 followed by 3 values of 2, followed by 1 value of 3, followed by 1 value of 1, followed by 3 values of 8. That's essentially what rle does but it separates the value and the length into two separate vectors.

Code:
> dat <- c(1, 1, 2, 2, 2, 3, 1, 8, 8, 8)
> rle(dat)
Run Length Encoding
lengths: int [1:5] 2 3 1 1 3
values : num [1:5] 1 2 3 1 8

#### bryangoodrich

##### Probably A Mammal
ooooh yeah. You've explained this before. I just never use it, and when I saw this problem I was thinking from a database programming perspective: break the table into its subsets and count the rows, then merge into our new table based on the keys. Obviously a vectorized transformation would be better, but I knew rle could be used from the times I've seen you apply it before, I just wasn't going to bother reading help documents tonight lol

#### smithosaurus

##### New Member
Thanks so much! That rle function worked wonders. Appreciate all the help guys