+ Reply to Thread
Results 1 to 2 of 2

Thread: Tapply Approach

  1. #1
    Point Mass at Zero
    Points: 5,855, Level: 49
    Level completed: 53%, Points required for next Level: 95
    ledzep's Avatar
    Location
    Berks,UK
    Posts
    635
    Thanks
    169
    Thanked 130 Times in 128 Posts

    Tapply Approach



    Hi,

    I use ddply a lot which works like a charm for small datasets. However, when it comes to large datasets, it falters [I was waiting forever for my ddply progress bar to reach 100%].

    Dason suggested the use of tapply. I was very impressed with how quick tapply was. However, I couldn't quite figure out how to use merge the lists back to the parent data frame.

    Code: 
    # Example data
    test<-structure(list(id = c("A", "A", "A", "A", "A", "A", "A", 
    "A", "B", "B", "B", "B", "B", "B", "B", "B", "C", "C", "C", "C", 
    "C", "C", "C", "C"), date = c("2011-01-11", "2011-01-12", "2011-01-13", 
    "2011-01-14", "2011-01-18", "2011-01-25", "2011-02-01", "2011-02-08", 
    "2011-01-11", "2011-01-12", "2011-01-13", "2011-01-14", "2011-01-18", 
    "2011-01-25", "2011-02-01", "2011-02-08", "2011-01-11", "2011-01-12", 
    "2011-01-13", "2011-01-14", "2011-01-18", "2011-01-25", "2011-02-01", 
    "2011-02-08")), .Names = c("id", "date"), row.names = c("6", 
    "5", "3", "1", "2", "8", "7", "4", "12", "15", "11", "16", "14", 
    "9", "13", "10", "17", "19", "20", "22", "21", "18", "23", "24"
    ), class = "data.frame")
    
    test$date<-as.Date(test$date) # declare date
    Now use tapply.

    Code: 
    # Work out successive difference between rows
    Diff_calc<- function(x){
    		(c(NA, diff(x))) # or replace the NAs as 0
    		}
    
    
    
    # test$diff<-tapply(test$date, test$id, Diff_calc) # doesn't work very well
    
    # Instead use
    diff<-tapply(test$date, test$id, Diff_calc)
    > diff
    $A
    [1] NA  1  1  1  4  7  7  7
    
    $B
    [1] NA  1  1  1  4  7  7  7
    
    $C
    [1] NA  1  1  1  4  7  7  7
    How would I bind these lists back to my parent date frame?

    Many Thanks
    Oh Thou Perelman! Poincare's was for you and Riemann's is for me.

  2. #2
    Cookie Scientist
    Points: 5,945, Level: 49
    Level completed: 98%, Points required for next Level: 5
    Jake's Avatar
    Location
    Boulder, CO
    Posts
    797
    Thanks
    18
    Thanked 315 Times in 241 Posts

    Re: Tapply Approach


    Code: 
    test$diff <- unlist(diff)
    “In God we trust. All others must bring data.”
    ~W. Edwards Deming

  3. The Following User Says Thank You to Jake For This Useful Post:

    ledzep (04-19-2012)

+ Reply to Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts








Advertise on Talk Stats