+ Reply to Thread
Results 1 to 13 of 13

Thread: Merging Multiple Data Frames

  1. #1
    Probably A Mammal
    Points: 18,957, Level: 87
    Level completed: 22%, Points required for next Level: 393
    bryangoodrich's Avatar
    Location
    Sacramento, California, United States
    Posts
    2,201
    Thanks
    292
    Thanked 494 Times in 449 Posts

    Merging Multiple Data Frames




    I've had an earlier thread that talked about this problem, and I used a series of merge statements to get the job done. While it works, I really can't stand the process flow, and it doesn't generalize to other situations very well. I also want to keep this process in base R. To keep it general, let me make some very simplified data sets to make this work.

    Code: 
    df1 <- data.frame(id = letters[1:5], val = rnorm(5))
    df2 <- data.frame(id = letters[c(1, 3, 4, 5, 8, 9, 10)], val = rnorm(7))
    df3 <- data.frame(id = letters[c(1, 5, 10)], val = rnorm(3))
    Clearly, the values are unimportant. What is important is to get something like a 7x4 data frame with an id column, and 3 fields for each of the value sets, with NA where things don't match. Something like

    Code: 
      id       val1        val2       val3
    1  a -0.5913211 -0.73382445 -0.4547612
    2  b -0.7412030          NA         NA
    3  c  2.0291696  0.58810832         NA
    4  d -0.2334067  0.99821419         NA
    5  e  0.1564856  0.01476102 -1.4973516
    6  h         NA -1.16277465         NA
    7  i         NA  0.54842612         NA
    8  j         NA  1.20241347 -1.6101057
    The problem is that a sequence of merge statements needs to be constantly expanded and altered to fit how many frames you're tying together. I'm looking for a more direct way to match the 'id' of each frame and put together the data set. I'm thinking moving to lists, making the vectors of equal size and then putting them into a data frame might be efficient (since, after all, that is all a data frame is).

    I'll post my tentative results, but your support is appreciated.

    Currently my outline is something like this:

    Code: 
    merge.all <- function(by, ...) {
      frames <- list(...)
      ... do stuff here ...
    }  # end merge.all
    
    # Make call
    merge.all("id", df1, df2, df3)

  2. #2
    ggplot2orBust
    Points: 35,306, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    User with most referrers
    trinker's Avatar
    Location
    Buffalo, NY
    Posts
    3,949
    Thanks
    1,356
    Thanked 753 Times in 674 Posts

    Re: Merging Multiple Data Frames

    I remember seeing a multimerge function on R bloggers in my journies a while back. Haven't needed it myself yet but put it in the favorites folder for later anyway.

    http://www.r-bloggers.com/merging-mu...ne-data-frame/

    I haven't actually looked at your problem. Let me know if that's helpful (this one seems for when your reading in the files but it still may be helpful).
    "If you torture the data long enough it will eventually confess."
    -Ronald Harry Coase -

  3. #3
    Probably A Mammal
    Points: 18,957, Level: 87
    Level completed: 22%, Points required for next Level: 393
    bryangoodrich's Avatar
    Location
    Sacramento, California, United States
    Posts
    2,201
    Thanks
    292
    Thanked 494 Times in 449 Posts

    Re: Merging Multiple Data Frames

    O
    M
    F
    G

    That Reduce function is awesome!

    Code: 
    df1 <- data.frame(id = letters[1:5], val = rnorm(5))
    df2 <- data.frame(id = letters[c(1, 3, 4, 5, 8, 9, 10)], val = rnorm(7))
    df3 <- data.frame(id = letters[c(1, 5, 10)], val = rnorm(3))
    
    merge.all <- function(by, ...) {
      frames <- list(...)
      return (Reduce(function(x, y) {merge(x, y, by = by, all = TRUE)}, frames))
    }  # end merge.all
    
    merge.all(by = "id", df1, df2, df3)
    #   id      val.x       val.y        val
    # 1  a -0.5913211 -0.73382445 -0.4547612
    # 2  b -0.7412030          NA         NA
    # 3  c  2.0291696  0.58810832         NA
    # 4  d -0.2334067  0.99821419         NA
    # 5  e  0.1564856  0.01476102 -1.4973516
    # 6  h         NA -1.16277465         NA
    # 7  i         NA  0.54842612         NA
    # 8  j         NA  1.20241347 -1.6101057
    Just need to edit the column names (which I could do some V1, V2, ..., thing within the function), and that's exactly it! LOVE IT

  4. #4
    Probably A Mammal
    Points: 18,957, Level: 87
    Level completed: 22%, Points required for next Level: 393
    bryangoodrich's Avatar
    Location
    Sacramento, California, United States
    Posts
    2,201
    Thanks
    292
    Thanked 494 Times in 449 Posts

    Re: Merging Multiple Data Frames

    Code: 
    # Updated Version
    merge.all <- function(by, ...) {
      frames <- list(...)
      df <- Reduce(function(x, y) { merge(x, y, by = by, all = TRUE) }, frames)
      names(df) <- c(by, paste("V", seq(length(frames)), sep = ""))
    
      return(df)
    }

  5. #5
    Probably A Mammal
    Points: 18,957, Level: 87
    Level completed: 22%, Points required for next Level: 393
    bryangoodrich's Avatar
    Location
    Sacramento, California, United States
    Posts
    2,201
    Thanks
    292
    Thanked 494 Times in 449 Posts

    Re: Merging Multiple Data Frames

    Okay, my data is actually already in a list. I'm trying to put a conditional at the beginning to check if "..." is a list. If it is, set "frames" to the input list (and only one such list entered). But I tried is.list and it says my data frame was a list. WTF?!

  6. #6
    Beep
    Points: 61,742, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Discussion EnderPosting AwardCommunity AwardMaster TaggerFrequent Poster
    Dason's Avatar
    Location
    Ames, IA
    Posts
    11,112
    Thanks
    261
    Thanked 2,156 Times in 1,837 Posts

    Re: Merging Multiple Data Frames

    Reduce is pretty nice - I think it's a somewhat standard function. I know it's in python - I'm assuming it shows up in other languages as well.

  7. #7
    Beep
    Points: 61,742, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Discussion EnderPosting AwardCommunity AwardMaster TaggerFrequent Poster
    Dason's Avatar
    Location
    Ames, IA
    Posts
    11,112
    Thanks
    261
    Thanked 2,156 Times in 1,837 Posts

    Re: Merging Multiple Data Frames

    Quote Originally Posted by bryangoodrich View Post
    But I tried is.list and it says my data frame was a list. WTF?!
    Data frames are lists. They're just lists structured in a specific way with certain indexing functions and nice print functions.

  8. #8
    Probably A Mammal
    Points: 18,957, Level: 87
    Level completed: 22%, Points required for next Level: 393
    bryangoodrich's Avatar
    Location
    Sacramento, California, United States
    Posts
    2,201
    Thanks
    292
    Thanked 494 Times in 449 Posts

    Re: Merging Multiple Data Frames

    Yeah, but I expect is.list to do a class check, not a mode check. Otherwise, I'll have to do something like

    Code: 
    if (class(...) != "list") ...
    I've never heard of Reduce before, and I still don't quite understand what it's doing lol The scary part is that the function isn't all that big (if you look at its body).

  9. #9
    Beep
    Points: 61,742, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Discussion EnderPosting AwardCommunity AwardMaster TaggerFrequent Poster
    Dason's Avatar
    Location
    Ames, IA
    Posts
    11,112
    Thanks
    261
    Thanked 2,156 Times in 1,837 Posts

    Re: Merging Multiple Data Frames

    It just iteratively applies your binary function to the elements of the list.
    Essentially I believe it is doing this...
    Code: 
    merge(merge(merge(frames[[1]], frames[[2]], by = by, all = TRUE), frames[[3]], by = by, all = TRUE), frames[[4]], by = by, all = TRUE)

  10. The Following User Says Thank You to Dason For This Useful Post:

    bryangoodrich (12-15-2011)

  11. #10
    Probably A Mammal
    Points: 18,957, Level: 87
    Level completed: 22%, Points required for next Level: 393
    bryangoodrich's Avatar
    Location
    Sacramento, California, United States
    Posts
    2,201
    Thanks
    292
    Thanked 494 Times in 449 Posts

    Re: Merging Multiple Data Frames

    I'm going to add this function (now named mergeMulti) to my website, but I wanted to make it a bit more robust given what I mentioned above. Currently it takes in data frames

    Code: 
    mergeMulti(by = something, df1, df2, df3)
    I've encountered last week or so that sometimes I already have my stuff in a list. Then I'm simply doing the Reduce part manually and using that list

    Code: 
    Reduce(function(x,y){merge(x,y, by = by, all = TRUE)}, myList)
    It would be nice to simply make use of the wrapper having it recognize I'm feeding it a list and avoid making a list out of the data frames

    Code: 
    mergeMulti(by = by, myList)
    However, as I pointed out above, I have no idea how to get R to recognize I'm feeding it a list to work off of. Anyone has any idea how to get around this problem? I might also face the problem "what if I have a few lists?" I'm questioning whether the wrapper should handle that case, or if the user should just compress them into one list object. If I go that route, why don't I just say the user should make the list out of the data frames to begin with? Then I can assume I'm only handing a list object and it becomes even more simplified to the point it just saves one a little space on typing out the Reduce(...merge...) stuff.

    What do you guys think? Any ideas?

  12. #11
    Beep
    Points: 61,742, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Discussion EnderPosting AwardCommunity AwardMaster TaggerFrequent Poster
    Dason's Avatar
    Location
    Ames, IA
    Posts
    11,112
    Thanks
    261
    Thanked 2,156 Times in 1,837 Posts

    Re: Merging Multiple Data Frames

    Maybe reorder the parameters and make it an S3 class?

    Code: 
    j.list <- function(list, by = NULL){
    	print("list")
    	print(list)
    }
    
    j.data.frame <- function(df, ..., by = NULL){
    	print("data frame")
    	print(df)
    }
    
    j <- function(x, ...){
    	UseMethod("j")
    }
    
    j(list(test = "this is a list"))
    j(data.frame(test = 1:10))
    And your data frame implementation can just create the list and then call the list implementation. Saves some coding in that regard.

  13. The Following 2 Users Say Thank You to Dason For This Useful Post:

    bryangoodrich (12-27-2011), trinker (12-27-2011)

  14. #12
    ggplot2orBust
    Points: 35,306, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    User with most referrers
    trinker's Avatar
    Location
    Buffalo, NY
    Posts
    3,949
    Thanks
    1,356
    Thanked 753 Times in 674 Posts

    Re: Merging Multiple Data Frames

    bryangoodrich if/when you update again could you post here or provide the link to your website (I have the link but thought this is a pretty useful and interesting function to tear apart, and others may be interested as well).
    "If you torture the data long enough it will eventually confess."
    -Ronald Harry Coase -

  15. #13
    Probably A Mammal
    Points: 18,957, Level: 87
    Level completed: 22%, Points required for next Level: 393
    bryangoodrich's Avatar
    Location
    Sacramento, California, United States
    Posts
    2,201
    Thanks
    292
    Thanked 494 Times in 449 Posts

    Re: Merging Multiple Data Frames


    Dason, good call. I was thinking of going that route. I was just hoping there might be an easy way to check if the entered objects were a list. My approach failed, as I mentioned. Making separate functions for each class would be more appropriate. I've never done the UseMethod approach, either. Good time to learn!

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats