Today I Learned: ____

Dason

Ambassador to the humans
#1
I thought it might be nice to just have a place to post some happy little things you learned or (re)discovered about R that makes your life nicer and you just want to share! Heck it can be just some nice things that you really appreciate or love about R that you just want to share. With that...

TIL: The split function is nice! I was going through some old code and had a data table that had the form

TIL: I don't know why I don't use the stringsAsFactors=F option when using read.table more often. Most times I just want to work with a string and dealing with factors can be a pain when you don't need to...
 

bryangoodrich

Probably A Mammal
#2
I was looking up some programming stuff about R and ran across a discussion at SO about efficient "for loops" by not using for. In particular, this is when you're dealing with indexed loops (as opposed to loops over values like for (i in c("CA", "WA", "AZ")), say). You can simply do

Code:
lapply(seq(10), ... stuff here ...)
in place of

Code:
for (i in 1:10) {... stuff here ...}
It also has the benefit of not introducing the index variable into the environment, which I hate when I try to keep my environment clean in my scripts.

@Dason, I rarely use stringsAsFactors because I usually specify colClasses. That way, I have direct control over what my classes are. Of course, if I want R to handle that and just keep things as numeric or character, stringsAsFactors is definitely the way to go.
 

Dason

Ambassador to the humans
#3
I was looking up some programming stuff about R and ran across a discussion at SO about efficient "for loops" by not using for. In particular, this is when you're dealing with indexed loops (as opposed to loops over values like for (i in c("CA", "WA", "AZ")), say). You can simply do

Code:
lapply(seq(10), ... stuff here ...)
in place of

Code:
for (i in 1:10) {... stuff here ...}
I've used that trick before but wasn't sure how helpful it is. I should benchmark something and see what kind of gains are possible. But there do seem to be some things you might not be able to do with that trick as easily as you can do with a for loop.

@Dason, I rarely use stringsAsFactors because I usually specify colClasses. That way, I have direct control over what my classes are. Of course, if I want R to handle that and just keep things as numeric or character, stringsAsFactors is definitely the way to go.
I typically don't need too much control but do get annoyed when things are factors and I just want strings so stringsAsFactors is nice for me.
 

trinker

ggplot2orBust
#4
ETIL (12:43 am): I have seen the colwise function in plyr before and thought "oh ok whatever" with no understanding of what it does or bothering to learn. Then I'm reading ggplot2 (Springer book by Wickham) and I see it actually takes a column function and generates a new function that operates on all columns in a dataframe. How awesome is that?

Code:
library(plyr)

median(mtcars)  #I don't work on dataframes : (
median(mtcars$hp)  #just columns : ( x 2
colwise(median)  #voodoo magic (see how it's generating a function)
DFmedian <- colwise(median)  #Here's the creation of dataframe function (Insert Dason BWHAHAHA here)
DFmedian(mtcars)  #Look ma I work on dataframes : )
He has numcolwise and catcolwise to work on numeric or categorical columns only as well.
 

Dason

Ambassador to the humans
#5
It does a little bit more than just that. Otherwise using something like apply(mtcars, 2, median) would work just as well. But it does seem like it might provide a nice wrapper to stick into lapply if you have a bunch of dataframes you want to get colwise summaries for inside a list.
 

trinker

ggplot2orBust
#6
Yeah it has added niceties over apply(df, 2, function) but my example was to just be minimal. The specific cat and num version is also nice. Up til now I've been using:
Code:
NUM <- function(dataframe)dataframe[,sapply(dataframe,is.numeric)]
apply(NUM(CO2), 2, median)
It also works nicely with ddply as the function argument. I think this will save time for initial data examination.
 

Dason

Ambassador to the humans
#9
Seems like whoever posted that was most likely very handsome and probably at the moment sitting next to an adorable dog.

That works for the same reason that something like mtcars[colnames(mtcars) == "cyl"] works.

Code:
colnames(mtcars) == "cyl" # is just a logical vector
# and you can index which columns or rows you want using a logical vector
mtcars[colnames(mtcars) == "cyl"]

# And R is smart in that it will recycle vectors to give you a vector
# as long as you need
mtcars[c(T,F)] # is really expanding out to mtcars[c(T,F,T,F,T,F,T,F,T,F,T)]
 

bryangoodrich

Probably A Mammal
#10
Ever since I used that recycling property in my "melt in base" function, I've been keen to notice its utility. I like that simple way of grabbing even/odd columns. I don't know when it will be useful, but it certainly shortens things up in the code. That creativity can go a long way!
 
#11
TIL: I don't know why I don't use the stringsAsFactors=F option when using read.table more often. Most times I just want to work with a string and dealing with factors can be a pain when you don't need to...
You know that is something you can change on start-up (Rprofile).

See ?options

I for instance have custom entries @ $defaultPackages, $help.try.all.packages, $digits, $editor, $pdfviewer

You could change $stringsAsFactors to False if you need to
 

Dason

Ambassador to the humans
#12
I work with too many different computers that I prefer not modifying my Rprofile. I think on one of my computers I have R display a fortune on load up but I don't want to become dependent on having anything preset for me. Although I've made an exception in the past to have it always select Iowa State as my CRAN repository but that doesn't really change how I use R or how R handles anything.
 
#13
I work with too many different computers that I prefer not modifying my Rprofile. I think on one of my computers I have R display a fortune on load up but I don't want to become dependent on having anything preset for me. Although I've made an exception in the past to have it always select Iowa State as my CRAN repository but that doesn't really change how I use R or how R handles anything.
This little line of code solved the multiple PC problem for me.

Code:
if(Sys.info()[1]=="Linux") {setwd("/home/ggplothater/Dropbox")} else
                          {setwd("D:/Dropbox/My Dropbox")}
Place you customations in the that folder and you should be fine. Then all you need to change is that piece of code in the Rprofile when you install a new copy of R on a new machine (and source in all your customations). Likely won't work on a system that wont let you install Dropbox though.

Don't know if it will be useful for you, but it works for me.. got R just the way I want it no matter where I am.
 

Dason

Ambassador to the humans
#14
I definitely thought of doing something like that but unfortunately about half the computers I use don't allow dropbox. Which I've been trying to get changed because it would make a lot of other things easier for me. But then again I'm alright with loading R vanilla every time.

And please tell me your linux username isn't actually ggplothater.
 

Dason

Ambassador to the humans
#15
(tiptoeing in)
a thread like this would be awesome in the education forum for all of us students....

(tiptoeing out...)
It probably would be a good thread for there. I don't have anything relevant to post there at the moment though but you should feel free to start that thread yourself.
 

Dason

Ambassador to the humans
#17
Ever run a function/simulation that takes a long time and after it's done you realize that you didn't save it to a variable so you can't process it any further?
Code:
takes.a.while <- function(){
  Sys.sleep(10)
  rnorm(20)
}

takes.a.while()
That would be really annoying if there was no way to get at that information! Oh wait there is!
Code:
# Oh no I forgot to assign it to a variable
lifesaver <- .Last.value
lifesaver
.Last.value contains the value of the last top level expression.

TIL: .Last.value is your friend.
 

bryangoodrich

Probably A Mammal
#19
While this is not something I learned today, nor does it apply specifically to R, I thought it would be worth sharing for those that are interested. I know I was jubilant about it when I discovered it.

If you have your own web page, especially a Wordpress, then you have access to displaying your code in a very formal way making it easier to present, reference, and access your code online. The use of a Syntax Highlighter will put your unformatted code into its own display box with line numbers and proper alignment. A good one that is an easy plug-in to Wordpress is SyntaxHighlighter Evolved. It has rules on how to color code key words for a variety of languages. It is easy to implement. It also has a lot of options such as specifying which line numbers to highlight in a given display, whether line numbers should be shown, or whether a given code box should be collapsed (with the option to expand) on page load; this is similar to our "spoiler" tag here.

I've started using it. I'm going to eventually get around to creating a page for my ALSM project that includes my code for each chapter collapsed into their own boxes as part of its TOC. That way, it is easy to view my code and if someone wants the R script, they can download it from a link. Alternatively, I might give each chapter its own page and have each section collapse. I haven't decided. It is just one example where a bulk presentation of code can be made very easy to present, and I'll probably prepare the html I'll include beforehand since it'll mostly be the code with parts of it trapped between [noparse][sourcecode language='r'] ... code here ... [/sourcecode][/noparse] tags.

Very cool? I think so!
 

Dason

Ambassador to the humans
#20
I just tried copying some of the code from here. And the line numbers get copied as well. Is there a way to not have it copy the line numbers?