Column of values of unique subgroup counts...

#1
Hi,
So I have a data frame, and I would like to add an additional column which shows the total sum of rows within the subgroups...

heres a simple example

Code:
grp=c("A","A","A","A","A","B","B","B","B","C","C")
hour=c(1,1,1,2,2,1,1,1,2,1,1)

test=data.frame(grp,hour,score)
grp hour
1 A 1
2 A 1
3 A 1
4 A 2
5 A 2
6 B 1
7 B 1
8 B 1
9 B 2
10 C 1
11 C 1

What I want is an additional row showing the number of rows with the respective unique values. Some like this...


grp hour #unique
1 A 1 3
2 A 1 3
3 A 1 3
4 A 2 2
5 A 2 2
6 B 1 3
7 B 1 3
8 B 1 3
9 B 2 1
10 C 1 2
11 C 1 2

Any help would be appreciated. Thanks!
 

Jake

Cookie Scientist
#3
Code:
key <- table(apply(test, 1, paste, collapse=""))
test$numUnique <- apply(test, 1, function(x) key[paste(x, collapse="")])
test
#    grp hour numUnique
# 1    A    1         3
# 2    A    1         3
# 3    A    1         3
# 4    A    2         2
# 5    A    2         2
# 6    B    1         3
# 7    B    1         3
# 8    B    1         3
# 9    B    2         1
# 10   C    1         2
# 11   C    1         2
 

bryangoodrich

Probably A Mammal
#4
Code:
x = data.frame(scan(what = list(rowid = 0, grp = "", hour = 0)))
1 A 1
2 A 1
3 A 1
4 A 2
5 A 2
6 B 1
7 B 1
8 B 1
9 B 2
10 C 1
11 C 1


merge(x, do.call("rbind.data.frame", by(x, list(x$grp, x$hour), function(xx) cbind(xx$rowid, nrow(xx)))), by.y = "V1", by.x = "rowid")
#    rowid grp hour V2
# 1      1   A    1  3
# 2      2   A    1  3
# 3      3   A    1  3
# 4      4   A    2  2
# 5      5   A    2  2
# 6      6   B    1  3
# 7      7   B    1  3
# 8      8   B    1  3
# 9      9   B    2  1
# 10    10   C    1  2
# 11    11   C    1  2
Ugly? You're **** right.
 

Dason

Ambassador to the humans
#5
Code:
grp=c("A","A","A","A","A","B","B","B","B","C","C")
hour=c(1,1,1,2,2,1,1,1,2,1,1)
test=data.frame(grp,hour)

key <- apply(test, 1, paste, collapse = "")
tmp <- rle(key)$lengths
test$unique <- rep(tmp, tmp)
which results in:
Code:
> test
   grp hour unique
1    A    1      3
2    A    1      3
3    A    1      3
4    A    2      2
5    A    2      2
6    B    1      3
7    B    1      3
8    B    1      3
9    B    2      1
10   C    1      2
11   C    1      2
 

Dason

Ambassador to the humans
#9
rle finds the run length encoding of a vector. So basically it takes in a vector and returns 2 vectors.

For example consider 1 1 2 2 2 3 1 8 8 8. We could encode that by saying "We have 2 values of 1 followed by 3 values of 2, followed by 1 value of 3, followed by 1 value of 1, followed by 3 values of 8. That's essentially what rle does but it separates the value and the length into two separate vectors.

Code:
> dat <- c(1, 1, 2, 2, 2, 3, 1, 8, 8, 8)
> rle(dat)
Run Length Encoding
  lengths: int [1:5] 2 3 1 1 3
  values : num [1:5] 1 2 3 1 8
 

bryangoodrich

Probably A Mammal
#10
ooooh yeah. You've explained this before. I just never use it, and when I saw this problem I was thinking from a database programming perspective: break the table into its subsets and count the rows, then merge into our new table based on the keys. Obviously a vectorized transformation would be better, but I knew rle could be used from the times I've seen you apply it before, I just wasn't going to bother reading help documents tonight lol