+ Reply to Thread
Page 11 of 22 FirstFirst 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 LastLast
Results 151 to 165 of 322

Thread: Today I Learned: ____

  1. #151
    ggplot2orBust
    Points: 35,599, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    User with most referrers
    trinker's Avatar
    Location
    Buffalo, NY
    Posts
    3,966
    Thanks
    1,382
    Thanked 757 Times in 677 Posts

    Re: Today I Learned: ____




    bryan it gets rid of the legend and plots the groups/labels what ever directly on the graph. It may (I don't think this is appropriate in every situation) be easier to interpret your graph as you're eyes aren't going back and forth from legend to plot. It also does the formatting for you so you don't have labels on top of each other. And it seems to play well with all three graphics systems. I think it could save some time and has potential for some circumstances (check out the topo map (contour map) with the numbers right on the graphic. Much more interpretable and professional.
    "If you torture the data long enough it will eventually confess."
    -Ronald Harry Coase -

  2. The Following User Says Thank You to trinker For This Useful Post:

    bryangoodrich (02-23-2012)

  3. #152
    Point Mass at Zero
    Points: 7,997, Level: 60
    Level completed: 24%, Points required for next Level: 153
    ledzep's Avatar
    Location
    Berks,UK
    Posts
    650
    Thanks
    174
    Thanked 133 Times in 131 Posts

    Re: Today I Learned: ____

    I wonder why I didn't use the "cast" function ever before. I am familiar with the melt function but never got around to use "cast" at all.
    I used to subset the data to 4 datasets (one for each day) and then bind 'em together to get to my desired output.
    Now, what was "Donkey's work", Cast does it very nicely. Very thankful to Hadly Wickham for his time-savers (already a big user of plyr, now add reshape to it).

    Code: 
    require(reshape);
    # d.f
    > test
       day person freq
    1    0      A    1
    2    1      A    2
    3    2      A    1
    4    3      A    2
    5    0      B    1
    6    1      B    3
    7    2      B    1
    8    3      B    6
    9    0      C    3
    10   1      C    6
    11   2      C    3
    12   3      C    2
    
    
    > summary<-cast(test,person~day,)
    Using freq as value column.  Use the value argument to cast to override this choice
    > summary
      person 0 1 2 3
    1      A 1 2 1 2
    2      B 1 3 1 6
    3      C 3 6 3 2
    Oh Thou Perelman! Poincare's was for you and Riemann's is for me.

  4. #153
    Phineas Packard
    Points: 9,583, Level: 65
    Level completed: 78%, Points required for next Level: 67
    Lazar's Avatar
    Location
    Sydney
    Posts
    927
    Thanks
    157
    Thanked 250 Times in 227 Posts

    Re: Today I Learned: ____

    This could be fun to have in your RProfile to play on startup:
    Code: 
    Dave <- function(){
     #Download from  'http://www.palantir.net/2001/tma1/wav/hihal.wav'
      require(tuneR)
      play('hihal.wav')
    }
    Last edited by Lazar; 02-26-2012 at 04:02 AM.

  5. #154
    ggplot2orBust
    Points: 35,599, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    User with most referrers
    trinker's Avatar
    Location
    Buffalo, NY
    Posts
    3,966
    Thanks
    1,382
    Thanked 757 Times in 677 Posts

    Re: Today I Learned: ____

    I'm tired of waiting around for Dason to stop fooling around and make us a decent alarm clock:

    Code: 
    alarm.clock <- function(hours=0, minutes=0, seconds=0){
        x <- c(hours*360 + minutes*60 + seconds)
        Sys.sleep(x)
        browseURL("http://www.youtube.com/watch?v=DkJ6OScQshI")
    } #end of function
    
    alarm.clock(seconds=5)
    "If you torture the data long enough it will eventually confess."
    -Ronald Harry Coase -

  6. The Following 2 Users Say Thank You to trinker For This Useful Post:

    Lazar (02-27-2012), ledzep (03-02-2012)

  7. #155
    Beep
    Points: 62,205, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Discussion EnderPosting AwardCommunity AwardMaster TaggerFrequent Poster
    Dason's Avatar
    Location
    Ames, IA
    Posts
    11,170
    Thanks
    263
    Thanked 2,174 Times in 1,853 Posts

    Re: Today I Learned: ____

    Don't have R on your machine but just want to run a quick simulation or something?

    TIL: http://ideone.com/

    It's a neat site that actually lets you compile code for a bunch of different languages. R seems to have quite a few packages available to it but the one time I tried plot (in base) it gave an error. Still it's interesting if you don't have access to R.
    Morte a tutti i raptors
    001100010010011110100001101101110011

  8. The Following User Says Thank You to Dason For This Useful Post:

    bryangoodrich (03-01-2012)

  9. #156
    Probably A Mammal
    Points: 19,029, Level: 87
    Level completed: 36%, Points required for next Level: 321
    bryangoodrich's Avatar
    Location
    Sacramento, California, United States
    Posts
    2,201
    Thanks
    292
    Thanked 494 Times in 449 Posts

    Re: Today I Learned: ____

    oh that could be interesting! The plot may not have worked because it doesn't control graphic devices? Just a guess. I'll definitely have to "hello world" some random languages now lol

  10. #157
    Cookie Scientist
    Points: 9,194, Level: 64
    Level completed: 48%, Points required for next Level: 156
    Jake's Avatar
    Location
    Boulder, CO
    Posts
    1,120
    Thanks
    43
    Thanked 475 Times in 363 Posts

    Re: Today I Learned: ____

    TIL: the (kind of subtle) difference between bootstrapping a mean group difference vs. a permutation test, and how to implement both in R using the boot package.

    Here is a well-annotated example:
    Code: 
    library(boot)
    
    # make up data
    dat <- stack(list(a = rnorm(15, mean=10),
                      b = rnorm(15, mean=11)))
    str(dat)
    t.test(values ~ ind, data=dat, var.equal=T)
    
    ##############################################################
    ##### Bootstrap of group difference                      #####
    ##############################################################
    
    # define bootstrap statistic
    bootDiff <- function(data, indices, print=F){
      newdata <- data[indices,]
      if(print) print(indices)
      return(diff = diff(tapply(newdata$values, newdata$ind, mean)))
    }
    
    # the indices tell us all about how the bootstrap works
    # first boot passes in the original indices: seq(nrow(dat))
    # this is to get the observed estimate of the statistic
    # then, for each resample, boot samples the indices
    # from each stratum (group), WITH REPLACEMENT, separately
    boot(data=dat, statistic=bootDiff, R=2,
         print=TRUE, strata=rep(1:2, each=15))
    
    # bootstrap!
    resamples <- 10000
    results1 <- boot(data=dat, statistic=bootDiff,
                     R=resamples, strata=rep(1:2, each=15))
    results1
    boot.ci(results1)
    plot(results1)
    # notice that our distribution of resampled group diffs
    # is centered around the original estimate.
    # the standard error is the SD of this distribution
    # and the 95% CI bounds can be taken as the .025 and .975 quantiles
    # or in a more complex way such as those given by boot.ci()
    
    ##############################################################
    ##### Permutation test                                   #####
    ##############################################################
    
    # define permutation test statistic
    permDiff <- function(data, indices, print=F){
      if(print) print(indices)
      if(print) print(length(unique(indices)))
      indices <- factor(indices <= 15, labels=c("b","a"))
      if(print) print(indices)
      newdata <- cbind.data.frame(indices, values=data$values)
      return(diff = diff(tapply(newdata$values, newdata$ind, mean)))
    }
    
    # as usual, boot first passes in the original indices.
    # next, boot randomly "shuffles" the indices (i.e., no replacement)
    # such that each value has an equal chance
    # of coming from either group a or group b.
    # the logic here is that, if the null hypothesis is true,
    # then the particular pattern of indices in the observed data
    # is arbitrary, and we want to know the distribution of group
    # differences that we might have seen with other arbitrary indices
    boot(data=dat, statistic=permDiff, R=2,
         print=TRUE, sim="permutation")
    
    # permutate!
    # in theory, there are a finite number of permutations of the indices.
    # but in practice, it is usually not feasible to exhaust every 
    # possibility. even in this smallish example, there are choose(30,15)
    # = 155,117,520 possible permutations. so instead we will randomly
    # sample 10k of the possible permutations
    resamples <- 10000
    results2 <- boot(data=dat, statistic=permDiff,
                     R=resamples,sim="permutation")
    plot(results2)
    # notice that this distribution is centered around 0.
    # since the logic behind the permutation test is analogous
    # to null hypothesis significance testing, we can calculate
    # a p-value based on where our observed statistic lies
    # in the distribution of statistics under the null hypothesis.
    # e.g., we can do a two-tailed test like this:
    p.value <- mean(results2$t > results1$t0) + mean(results2$t < -1*results1$t0)
    p.value
    “In God we trust. All others must bring data.”
    ~W. Edwards Deming

  11. The Following User Says Thank You to Jake For This Useful Post:

    trinker (03-05-2012)

  12. #158
    Probably A Mammal
    Points: 19,029, Level: 87
    Level completed: 36%, Points required for next Level: 321
    bryangoodrich's Avatar
    Location
    Sacramento, California, United States
    Posts
    2,201
    Thanks
    292
    Thanked 494 Times in 449 Posts

    Re: Today I Learned: ____

    I've set up a randomization (permutation) distribution before. It wasn't easy.

    Spoiler:


    And even in this case it wasn't that big. When boot randomly takes out permutations, I assume this is with replacement, right?

    Also, in the bootDiff call from boot, you're setting up the strata. If I understand the code correctly, the line in permDiff that sets up the statistic correspondingly is

    Code: 
    indices <- factor(indices <= 15, labels=c("b","a"))
    Is that correct?

  13. #159
    Cookie Scientist
    Points: 9,194, Level: 64
    Level completed: 48%, Points required for next Level: 156
    Jake's Avatar
    Location
    Boulder, CO
    Posts
    1,120
    Thanks
    43
    Thanked 475 Times in 363 Posts

    Re: Today I Learned: ____

    Quote Originally Posted by bryangoodrich View Post
    When boot randomly takes out permutations, I assume this is with replacement, right?
    I don't know the answer for sure, but I assume (or rather, hope) that it is without replacement. That way one could in principle (although not usually in practice) exhaust every possible permutation without repeats and thus have an exact p-value.

    Quote Originally Posted by bryangoodrich View Post
    Also, in the bootDiff call from boot, you're setting up the strata. If I understand the code correctly, the line in permDiff that sets up the statistic correspondingly is

    Code: 
    indices <- factor(indices <= 15, labels=c("b","a"))
    Is that correct?
    If I understand your question correctly, then yes; this is the line where I take those indices that boot shuffled for me and use them for the group reassignment.
    “In God we trust. All others must bring data.”
    ~W. Edwards Deming

  14. #160
    Beep
    Points: 62,205, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Discussion EnderPosting AwardCommunity AwardMaster TaggerFrequent Poster
    Dason's Avatar
    Location
    Ames, IA
    Posts
    11,170
    Thanks
    263
    Thanked 2,174 Times in 1,853 Posts

    Re: Today I Learned: ____

    My guess is that boot samples with replacement. It would be a lot easier and depending on how you specify strata it could be very difficult to guarantee that you're sampling without replacement.

    Code: 
    
    library(boot)
    set.seed(643)
    mn <- function(d, i){mean(d[i])}
    dat <- rnorm(2)
    j <- boot(dat, mn, R = 10, stype = "i")
    j$t
    The first two samples of the statistic are the same. This wouldn't happen if the samples were without replacement.
    Morte a tutti i raptors
    001100010010011110100001101101110011

  15. #161
    Cookie Scientist
    Points: 9,194, Level: 64
    Level completed: 48%, Points required for next Level: 156
    Jake's Avatar
    Location
    Boulder, CO
    Posts
    1,120
    Thanks
    43
    Thanked 475 Times in 363 Posts

    Re: Today I Learned: ____

    Well, you didn't specify sim="permutation", so it's not even pretending to do a permutation test there--but, you did get me thinking, boot doesn't even know how to sample without replacement if we haven't specified the strata (since the number of observations in each strata would determine the number of possible permutations), but we never give it this information! So it must not be sampling without replacement after all. Looking through ?boot, I don't see that there is a way to give it this information either. (There is a strata argument, but this just makes it permutate within each stratum, which is not what we want.)

    This is a little disappointing because it would not be very hard to code in the ability to sample permutations without replacement, assuming the user provided the necessary information.
    “In God we trust. All others must bring data.”
    ~W. Edwards Deming

  16. #162
    Beep
    Points: 62,205, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Discussion EnderPosting AwardCommunity AwardMaster TaggerFrequent Poster
    Dason's Avatar
    Location
    Ames, IA
    Posts
    11,170
    Thanks
    263
    Thanked 2,174 Times in 1,853 Posts

    Re: Today I Learned: ____

    I don't see the problem though. Unless you're doing something with very small sample sizes then there are going to be more permutations than you could possibly fit easily. If you happen to get a few permutations more than once it doesn't really matter that much. You're still approximating the null distribution asymptotically.
    Morte a tutti i raptors
    001100010010011110100001101101110011

  17. #163
    Cookie Scientist
    Points: 9,194, Level: 64
    Level completed: 48%, Points required for next Level: 156
    Jake's Avatar
    Location
    Boulder, CO
    Posts
    1,120
    Thanks
    43
    Thanked 475 Times in 363 Posts

    Re: Today I Learned: ____

    You're right, it basically doesn't matter at all in practice. But still, it would feel nice knowing that it could provide the exact p-values if only in principle
    “In God we trust. All others must bring data.”
    ~W. Edwards Deming

  18. #164
    ggplot2orBust
    Points: 35,599, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    User with most referrers
    trinker's Avatar
    Location
    Buffalo, NY
    Posts
    3,966
    Thanks
    1,382
    Thanked 757 Times in 677 Posts

    Re: Today I Learned: ____

    I have really been interested in learning how to appropriately use the reshape function from base rather than relying on the reshape packages (Wickham). The base function is very flexible and may require less code than the reshape package. I also replicated the same thing using rbind, cbind, do.call, and some fancy indexing.

    I made myself a tutorial and thought I'd share. I was going to do this as a statspedia article but you can't just upload a pdf. I've got a lot going on right now so maybe another day.

    Hopefully others find this useful as well. Wide to Long Closely examined.pdf
    "If you torture the data long enough it will eventually confess."
    -Ronald Harry Coase -

  19. The Following User Says Thank You to trinker For This Useful Post:

    GretaGarbo (04-08-2012)

  20. #165
    ggplot2orBust
    Points: 35,599, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    User with most referrers
    trinker's Avatar
    Location
    Buffalo, NY
    Posts
    3,966
    Thanks
    1,382
    Thanked 757 Times in 677 Posts

    Re: Today I Learned: ____


    TIL: BEWARE OF THE facet_grid FUNCTION IN GGPLOT2
    There is a bug in ggplot two involving facet grid and duplicate rows (SEE LINK)

    That's why I created the IDer function (LINK) to add an ID row that eliminates the duplicate row problem (this problem is especially apparent in stacked data). I was having to add an id row repeatedly (I had IDs but when they're stacked they're no longer unique) so I made that function to save time:

    Code: 
    library(ggplot2)
    set.seed(10)
    CO3 <- data.frame(CO2[, 2:3], outcome=factor(sample(c('none', 'some', 'lots', 'tons'), 
               nrow(CO2), rep=T), levels=c('none', 'some', 'lots', 'tons')))
    
        ggplot(CO3, aes(x=outcome)) + geom_bar(aes(x=outcome)) +
        facet_grid(Treatment~., margins=TRUE)
    
    CO3 <- IDer(CO3)
    
        ggplot(CO3, aes(x=outcome)) + geom_bar(aes(x=outcome)) +
        facet_grid(Treatment~., margins=TRUE)
    THE IDer function

    Code: 
    IDer <- function(dataframe, id.name="id"){
        DF <- data.frame(c=1:nrow(dataframe), dataframe)
        colnames(DF)[1] <- id.name
        return(DF)
    }
    "If you torture the data long enough it will eventually confess."
    -Ronald Harry Coase -

+ Reply to Thread
Page 11 of 22 FirstFirst 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 LastLast

           




Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats