Today I Learned: ____

Dason

Ambassador to the humans
TIL: You an recursively access lists with "[[" indexing. I thought you could only access a single element of a list using "[[" but I didn't realize that if you pass in a vector it will recursively work it's way down the list...

For example
Code:
o[[c(1,2,3)]]
# is the same as
o[[1]][[2]][[3]]
Some concrete examples...
Code:
> x <- 1:10
> y <- rnorm(10)
> o <- lm(y ~ x)
> str(o)
List of 12
 $ coefficients : Named num [1:2] -0.0931 -0.013
  ..- attr(*, "names")= chr [1:2] "(Intercept)" "x"
 $ residuals    : Named num [1:10] -0.328 0.236 0.574 -1.261 0.299 ...
  ..- attr(*, "names")= chr [1:10] "1" "2" "3" "4" ...
 $ effects      : Named num [1:10] 0.52 -0.118 0.614 -1.194 0.393 ...
  ..- attr(*, "names")= chr [1:10] "(Intercept)" "x" "" "" ...
 $ rank         : int 2
 $ fitted.values: Named num [1:10] -0.106 -0.119 -0.132 -0.145 -0.158 ...
  ..- attr(*, "names")= chr [1:10] "1" "2" "3" "4" ...
 $ assign       : int [1:2] 0 1
 $ qr           :List of 5
  ..$ qr   : num [1:10, 1:2] -3.162 0.316 0.316 0.316 0.316 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:10] "1" "2" "3" "4" ...
  .. .. ..$ : chr [1:2] "(Intercept)" "x"
  .. ..- attr(*, "assign")= int [1:2] 0 1
  ..$ qraux: num [1:2] 1.32 1.27
  ..$ pivot: int [1:2] 1 2
  ..$ tol  : num 1e-07
  ..$ rank : int 2
  ..- attr(*, "class")= chr "qr"
 $ df.residual  : int 8
 $ xlevels      : Named list()
 $ call         : language lm(formula = y ~ x)
 $ terms        :Classes 'terms', 'formula' length 3 y ~ x
  .. ..- attr(*, "variables")= language list(y, x)
  .. ..- attr(*, "factors")= int [1:2, 1] 0 1
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:2] "y" "x"
  .. .. .. ..$ : chr "x"
  .. ..- attr(*, "term.labels")= chr "x"
  .. ..- attr(*, "order")= int 1
  .. ..- attr(*, "intercept")= int 1
  .. ..- attr(*, "response")= int 1
  .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
  .. ..- attr(*, "predvars")= language list(y, x)
  .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
  .. .. ..- attr(*, "names")= chr [1:2] "y" "x"
 $ model        :'data.frame':	10 obs. of  2 variables:
  ..$ y: num [1:10] -0.434 0.117 0.442 -1.406 0.141 ...
  ..$ x: int [1:10] 1 2 3 4 5 6 7 8 9 10
  ..- attr(*, "terms")=Classes 'terms', 'formula' length 3 y ~ x
  .. .. ..- attr(*, "variables")= language list(y, x)
  .. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
  .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. ..$ : chr [1:2] "y" "x"
  .. .. .. .. ..$ : chr "x"
  .. .. ..- attr(*, "term.labels")= chr "x"
  .. .. ..- attr(*, "order")= int 1
  .. .. ..- attr(*, "intercept")= int 1
  .. .. ..- attr(*, "response")= int 1
  .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
  .. .. ..- attr(*, "predvars")= language list(y, x)
  .. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
  .. .. .. ..- attr(*, "names")= chr [1:2] "y" "x"
 - attr(*, "class")= chr "lm"
> o[["qr"]]$rank
[1] 2
> # same as
> o[[c("qr", "rank")]]
[1] 2
> 
> coef(o)[["x"]]
[1] -0.01298689
> o[[c("coefficients", "x")]]
[1] -0.01298689
 

Dason

Ambassador to the humans
TIL: Two functions

Code:
> timestamp()
##------ Tue Jul 17 20:23:12 2012 ------##
timestamp is useful inside simulations. I usually do something like print(date()) or print(paste(i, format(date()))) but I think timestamp() is a little nicer.

And the other thing I learned which I don't see myself using anytime soon since I can't figure out a good way to catch the error it throws without wrapping everything in a big block... is the function setTimeLimit.

Code:
f <- function(){
  setTimeLimit(elapsed = 5)
  for(i in 1:7){
    timestamp()
    Sys.sleep(1)
  }
  return("Function exited")
}
Code:
> f()
##------ Tue Jul 17 20:25:59 2012 ------##
##------ Tue Jul 17 20:26:00 2012 ------##
##------ Tue Jul 17 20:26:01 2012 ------##
##------ Tue Jul 17 20:26:02 2012 ------##
##------ Tue Jul 17 20:26:03 2012 ------##
Error in Sys.sleep(1) : reached elapsed time limit
 

trinker

ggplot2orBust
Good ol' Bill Dunlap put some code up for accessing elements from dots (...) that traditionally used match.call. His method requires less processing:
Code:
f1 <- function(x, ...) substitute(...())                   #Dunlap's method
f2 <- function(x, ...) match.call(expand.dots=FALSE)$...   #traditional match.call

f1(1, warning("Hmm"), stop("Oops"), cat("some output\n"))
f2(1, warning("Hmm"), stop("Oops"), cat("some output\n"))
 

fed1

TS Contributor
What a great post. I was thinking just this today.

I like the ability to read html tables into data frames htmlreadtable {XML}.

Also I like that it does not require install on windows machine. Great.
 

trinker

ggplot2orBust
TIL: I got a lot to learn...

Yihui Xie said:
I cannot say I'm already an efficient R programmer, but GitHub did make me much more efficient.
This quote from the creator of knitr really puts you back in your place if you begin to think to highly of your R skills.
 

trinker

ggplot2orBust
TIL: return is not necessary in R but I use it, today Bill Dunlap advises against it:

Bill Dunlap said:
Another nitpick: don't use return() in the last statement.
It isn't needed, it looks like some other language, and
dropping it saves 8% of the time for the uncompiled code
(the compiler seems to get of it).

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
 

Dason

Ambassador to the humans
I read that a while ago. For the most part it doesn't make a big difference but I have been dropping the return if it can be dropped. Sometimes it can't be dropped though.
 

Dason

Ambassador to the humans
I've known about ** but I did learn something about it today...

Code:
> get("+")
function (e1, e2)  .Primitive("+")
> get("*")
function (e1, e2)  .Primitive("*")
> get("^")
function (e1, e2)  .Primitive("^")
> get("**")
Error in get("**") : object '**' not found
It seems that ** isn't actually a function.

Code:
help("**")
brings up the arithmetic functions help page but you'll notice that ** isn't a part of the usage list. But we do get this in the Notes section:
R_help_page said:
Note:

‘**’ is translated in the parser to ‘^’, but this was undocumented
for many years. It appears as an index entry in Becker _et al_
(1988), pointing to the help for ‘Deprecated’ but is not actually
mentioned on that page. Even though it had been deprecated in S
for 20 years, it was still accepted in R in 2008.
So TIL: That the use of ** is just a parser trick and technically if somebody made a different R interpreter/compiler I don't think they would have to support **.
 

trinker

ggplot2orBust
Sometimes I use stringsplit on a vector and wind up with a list. In the past I've used do.call and rbind and selected what column I want...

Unless it's only a split that results in two elements (columns) in that case I use unlist and the trick Dason taught with vector[c(T, F)] but never thought how to extend this beyond two elements. In an attempt to avoid work I devised two solutions that extends the notion of select every nth element of a vector. Any other approaches? (these are set up as functions but wouldn't need to be)

Code:
every <- function(n) c(rep(FALSE, n-1), TRUE)
every2 <- function(v, n) v[c(1:length(v) %% n) == 0]

c(1:20)[every(4)]
every2(1:20, 4)
 

Dason

Ambassador to the humans
Wouldn't it make more sense to define your every2 using your every function?

Code:
every2 <- function(v, n) v[every(n)]
And we can generalize this further by allowing the user to not just grab the last element in each group of n elements but instead let them say something like "in every 12 - grab the 3rd".

Code:
every3 <- function(n, k = n){
  id <- rep(FALSE, n)
  id[k] <- TRUE
  id
}

every4 <- function(v, n, k = n){ 
  v[every3(k, n)]
}

v <- 1:20
# Start at element 3 and grab every 5th element after that
every4(v, 5, 3)
although for those more complicated sequences it probably just makes more sense to use seq directly to construct your indices.
 

Jake

Cookie Scientist
Any other approaches?
You could *apply the "[" function to the splitted list and specify which index you want to grab from each list item. This approach is nice when it is not guaranteed that the splitted list will be strictly rectangular.
Code:
> (dat <- paste(letters[1:10], letters[2:11], letters[3:12], sep=","))
 [1] "a,b,c" "b,c,d" "c,d,e" "d,e,f" "e,f,g" "f,g,h" "g,h,i" "h,i,j" "i,j,k" "j,k,l"
> sapply(strsplit(dat, split=","), "[", 2)
 [1] "b" "c" "d" "e" "f" "g" "h" "i" "j" "k"
 

bugman

Super Moderator
although pretty basic, today I learned how useful and time saving the

drop1(model, test="")

function is going to be for me. I am learning how to use it for an overdispersed poisson model.
 

bryangoodrich

Probably A Mammal
I've never thought of using a "[" function within an *apply function. That's pretty slick, actually. Will have to keep that in mind for the future! Thanks Jake.
 

Dason

Ambassador to the humans
It's a pretty nice approach that I've used quite a few times but it doesn't quite fit perfectly with the example output trinker was giving.

But on that note I'll also add that if you have a list of similar objects you can use something similar to extract information from them using "$" or "[[" as the function you're calling on each element of the list.