+ Reply to Thread
Results 1 to 12 of 12

Thread: Melt in Base

  1. #1
    Probably A Mammal bryangoodrich's Avatar
    Join Date
    Jun 2011
    Location
    Sacramento, California, United States
    Posts
    1,456
    Thanks
    137
    Thanked 305 Times in 287 Posts

    Melt in Base



    I like to try and stay in base R when I can. There are just a lot of useful tools in the stats package. That being said, no one can deny that Reshape has a lot to offer. The melt function has become particularly useful for me because it is so much more intuitive than using stack or reshape.

    That being said, does anyone know how to recreate that intuitive behavior in base R? For instance, I have a data set with 3 factors and 7 numeric fields. With melt, it would know to stack the numeric fields using their column names as factor levels. Doing the same thing stack is easy if you only have one non-numeric field. The help files for stack appear useless to me (horrible examples and not much detail). I've always had trouble with getting reshape to do what I want (because it expects the data to fit a certain form).

    Here's a simple data set we can work with:

    Code: 
    df <- data.frame(
      A = gl(2, 2),     # 1 1 2 2
      B = gl(2, 1, 4),  # 1 2 1 2 
      C = rnorm(4) * 10,
      D = runif(4, -10, 10)
    );
    The long form should be (with '#' representing some random number from C or D)

    Code: 
    1 1 # C
    1 1 # D
    1 2 # C
    1 2 # D
    2 1 # C
    2 1 # D
    2 2 # C
    2 2 # D
    The only thing I can really think of is to

    (1) Capture the number of non-factor columns, call it n
    (2) Stack the data set (it removes the factors)
    (3) Repeat the factors (rbind?) n-times
    (4) Put together (2) and (3)
    (5) Sort according to each factor

  2. #2
    Probably A Mammal bryangoodrich's Avatar
    Join Date
    Jun 2011
    Location
    Sacramento, California, United States
    Posts
    1,456
    Thanks
    137
    Thanked 305 Times in 287 Posts

    Re: Melt in Base

    My tentative solution is below. I think it is straightforward. I take advantage of some features in R. For instance, when I concatenate the data frame, you get each column as its list parts. In a way, it is like "exploding" it into a list of manageable pieces that lends itself to better control than the matrix layout of a data frame. I also take advantage of the natural recycling to reproduce step (3) above. Thus, everything comes together nicely.

    Code: 
    myMelt <- function(df) {
      isFactor <- sapply(c(df), is.factor)
      y <- stack(df)
      cbind(df[which(isFactor)], y)
    }
    For instance,

    Code: 
    df <- data.frame(
      A = gl(2, 2),     # 1 1 2 2
      B = gl(2, 1, 4),  # 1 2 1 2 
      C = rnorm(4) * 10,
      D = runif(4, -10, 10)
    );
    
    df
    #   A B         C         D
    # 1 1 1  3.509282 -1.508020
    # 2 1 2 -1.798883 -7.413565
    # 3 2 1 -6.339989  4.443587
    # 4 2 2  2.845679  7.858665
    
    df <- myMelt(df)
    df[order(df$A, df$B), ]
    #   A B    values ind
    # 1 1 1  3.509282   C
    # 5 1 1 -1.508020   D
    # 2 1 2 -1.798883   C
    # 6 1 2 -7.413565   D
    # 3 2 1 -6.339989   C
    # 7 2 1  4.443587   D
    # 4 2 2  2.845679   C
    # 8 2 2  7.858665   D
    The one thing I'm missing is how to take the isFactor and use that to specify the order/sort within myMelt. I could then rename the row.names as a proper sequence and return the final result I want in the first place. Nevertheless, this did work!

  3. #3
    Probably A Mammal bryangoodrich's Avatar
    Join Date
    Jun 2011
    Location
    Sacramento, California, United States
    Posts
    1,456
    Thanks
    137
    Thanked 305 Times in 287 Posts

    Re: Melt in Base

    Using an external sort defeats the purpose! I know what it should be sorted on. I'm wondering if I could make use of that sortframe we discussed the other day, doing something like

    Code: 
    sortframe(df, df[, 1], df[, 2], ...)
    and using the individual values of isFactor to indicate the '1' and '2' or whatever they may be. It wouldn't violate the base solutions at work here. I could also include the id.vars parameter, and basically do the manual search I used whenever it isn't specified. A check to make sure those fields actually are factors should probably still be utilized (not concerned with error checking atm).

    The only other thing I can think of is to somehow produce an unevaluated statement or something along those lines (I'm still not familiar with the eval and call, etc. type functions), and use that to set up the ordering.

  4. #4
    RotPariColev trinker's Avatar
    Join Date
    Mar 2011
    Location
    Buffalo, NY
    Posts
    2,076
    Thanks
    477
    Thanked 307 Times in 277 Posts

    Re: Melt in Base

    Reshape will accomplish this from base:

    Code: 
    df2<-reshape(df, direction="long", varying=list(names(df)[3:4]), v.names="values",
    timevar = "ind", idvar=c("A","B"), times=names(df)[3:4])
    rownames(df2) <- 1:nrow(df2)
    df2
    
    df2[order(df2$A, df2$B), ]
    Not sure if it's any faster than your solution but it's about the same amount of code.
    "If you torture the data long enough it will eventually confess."
    -Ronald Harry Coase -

  5. #5
    Definitely a human Dason's Avatar
    Join Date
    May 2010
    Location
    Ames, IA
    Posts
    6,274
    Thanks
    79
    Thanked 859 Times in 733 Posts

    Re: Melt in Base

    Ha. I deleted my post right after I noticed you posted because I wanted to read what you had first.

  6. #6
    Probably A Mammal bryangoodrich's Avatar
    Join Date
    Jun 2011
    Location
    Sacramento, California, United States
    Posts
    1,456
    Thanks
    137
    Thanked 305 Times in 287 Posts

    Re: Melt in Base

    Just had a thought on the way home that I could just make my approach a wrapper for stack, using it as the work horse for the function. I haven't checked if this works yet, but my idea is this:

    Code: 
    myMelt <- function(df) {
      long <- stack(df)
      vars <- unique(long$ind)  
      cbind(subset(df, select = -vars), long)
    }
    I'll try your approach out tomorrow trinker. I never could understand quite what each of the parameters required to make it work right. Did you get the same result that I did? I probably should have just used some defined integers so the values didn't change in my example lol

  7. #7
    RotPariColev trinker's Avatar
    Join Date
    Mar 2011
    Location
    Buffalo, NY
    Posts
    2,076
    Thanks
    477
    Thanked 307 Times in 277 Posts

    Re: Melt in Base

    Yes I get the same results but it took some rechecking of my notes I keep on R (I'm at 175 pages I've accumulated) to figure out the parameters. They aren't intuitive. That's why we all love reshape2/1 and plyr so much. I actually think your approach is much more transparent, though I'm betting the reshape (base) approach may be faster.

    I also have to reorder the rows in the same way you did as well as give the rows new row names.
    "If you torture the data long enough it will eventually confess."
    -Ronald Harry Coase -

  8. #8
    Probably A Mammal bryangoodrich's Avatar
    Join Date
    Jun 2011
    Location
    Sacramento, California, United States
    Posts
    1,456
    Thanks
    137
    Thanked 305 Times in 287 Posts

    Re: Melt in Base

    It looks to me like reshape gives you more control, and it would be nice if it made some assumptions like melt (and my function) does regarding factors. Honestly, I don't even remember all the parameters to melt. I always just use it on a data frame with factors and it works like a charm, and pretty quickly, too. If I understand reshape correctly (will read up some more later), I can probably create a wrapper to make it easy for cases like this (e.g., the isFactor can be used to fill some of those parameters you set). The one thing I haven't tested yet is how it might handle classes like Date or if it is appropriate to submit a non-factor ID variable, assuming it'll be coerced into a factor. I never use the benchmarking faculties R has. Maybe I'll do it and see which process worked faster on cpu time.

  9. #9
    Probably A Mammal bryangoodrich's Avatar
    Join Date
    Jun 2011
    Location
    Sacramento, California, United States
    Posts
    1,456
    Thanks
    137
    Thanked 305 Times in 287 Posts

    Re: Melt in Base

    My method above did not work. First off, I needed to do levels(long$ind), and secondly, the -vars in subset doesn't work because vars is a vector of quoted names. Apparently that doesn't work within subset?? Seems stupid. Effectively it is the difference between

    Code: 
    subset(df, select = -c(C, D))
    subset(df, select = -c("C", "D"))
    The first works. The second does not. I'm at a loss at the moment as to how to correct it.

    EDIT: Oddly enough, this only applies to the negated selection. You get the same results for each expression below.

    Code: 
    subset(df, select = c(C, D))
    subset(df, select = c("C", "D"))
    My only fix is to forget subsetting in this manner and to focus on column selection

    Code: 
    myMelt <- function(df) {
      long <- stack(df)
      vars <- unique(levels(long$ind))
      cbind(df[, which(!names(df) %in% vars)], long)
    }

  10. The Following 2 Users Say Thank You to bryangoodrich For This Useful Post:

    bugman (01-06-2012), TheEcologist (01-06-2012)

  11. #10
    RotPariColev trinker's Avatar
    Join Date
    Mar 2011
    Location
    Buffalo, NY
    Posts
    2,076
    Thanks
    477
    Thanked 307 Times in 277 Posts

    Re: Melt in Base

    Negative selection can be a tricky beast. Here are two methods I've encountered that work (as you found out minus sign not so much).

    df[, !names(df) %in% c("C", "D")]
    df[, -match(c("C", "D"), names(df))]

    EDIT: Though you may be wanting to use subset and my lines won't work for your needs
    "If you torture the data long enough it will eventually confess."
    -Ronald Harry Coase -

  12. #11
    N=1 TheEcologist's Avatar
    Join Date
    Mar 2008
    Location
    The Netherlands.
    Posts
    1,092
    Thanks
    67
    Thanked 164 Times in 84 Posts

    Re: Melt in Base

    Quote Originally Posted by bryangoodrich View Post
    My only fix is to forget subsetting in this manner and to focus on column selection

    Code: 
    myMelt <- function(df) {
      long <- stack(df)
      vars <- unique(levels(long$ind))
      cbind(df[, which(!names(df) %in% vars)], long)
    }

    That is pretty useful, thanks for sharing.
    The true ideals of great philosophies always seem to get lost somewhere along the road..

  13. #12
    Probably A Mammal bryangoodrich's Avatar
    Join Date
    Jun 2011
    Location
    Sacramento, California, United States
    Posts
    1,456
    Thanks
    137
    Thanked 305 Times in 287 Posts

    Re: Melt in Base


    You're late to the party :P

    I've been thinking of altering it a bit.

    Code: 
    melt <- function(df, names = NULL) {
      long <- stack(df)
      vars <- levels(long$ind)  # I don't know why I thought I needed unique
      long <- data.frame(df[, which(!names(df) %in% vars)], long)
      if (!is.null(names)) {
        names(long) <- names
      } else {
        names(long) <- paste("V", seq(ncol(long)), sep = "")
      }  # end if-else
    
      return(long)
    }  # end melt
    This way the user can specify the names as a character vector or else it gets the sort of behavior as reading in a table of data without a header.

    One thing I need to control (or suppress) is the warning that comes from putting together the factors with the stacked variables, because any time there's more than one factor it spits out a warning about the row names or something. Since it's irrelevant and handled later, it just needs to go.

+ Reply to Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts







Have a statistics blog? Join StatsBlogs!





Advertise on Talk Stats