# Today I Learned: ____

#### Junes

##### Member
Thanks!

This is so weird. I can't get a stripped version of it to work. The stripped code (using a different data set) comes down to this:

Code:
library(randomForest)
dataset <- mtcars
model <- randomForest(mpg ~ ., data=dataset)
plotMargins <- function(fac) partialPlot(model, dataset, eval(fac))
lapply(c("disp", "wt", "gear"), plotMargins)
But it gives an error:

Code:
"Error in [.data.frame(pred.data, , xname) : undefined columns selected"

Basically, what I want to do: apply the partialPlot function to a list of variable names to get a selection of plots. However, partialPlot only works when I type out the names of the variables. Does that make sense?

Last edited:

#### buckshot

##### New Member
This is my first entry. It is nice being here

Hope that I will learn a lot about R from you guys

#### trinker

##### ggplot2orBust
Hi buckshot. Nice to have you. As you learn R things yourself post them here as well.

#### rogojel

##### TS Contributor
I tried to build a step function regression model with cut like :

mod=lm(Y~cut(X,n))

Applying predict to the model fails always, because the model apparently does not store the cut points but only the info that there need to be n of them. So in the new dataset the cut points will have a different position and the prediction will fail with error "new factor values".

The trick seems to be to get the cutpoints from the cut function and then build the model explicitely, using the cut points.

Code:
BPs=GetBPs(W.tr$age,n) mod=lm(wage~cut(age,breaks=BPs), data=W.tr) pred=predict(mod, newdata=list(age=W.te$age))
where the function of coaxing the list of cut points is homegrown, as the cut function does not provide this functionalty by default.

Code:
GetBPs=function(v,n){
x=gsub("\\]","",gsub("\\(", "",levels(cut(v,n))))
len=length(x)
numbers=numeric(length=2*len)
for(i in 1:len){
s=strsplit(x[i],"\\,")
numbers[2*i-1]=as.numeric(s[[1]][1])
numbers[2*i]=as.numeric(s[[1]][2])
}
return(unique(numbers))
}
Any idea how to do this better?

#### trinker

##### ggplot2orBust
@Dason learned about strrep from your post I will use this often

#### Dason

@Dason learned about strrep from your post I will use this often
Didn't even notice that until now. I haven't paid much attention to the new functions introduced.

strrep seems to save a bit of typing though
Code:
> paste0(rep("hey", 7), collapse ="")
[1] "heyheyheyheyheyheyhey"
> strrep("hey", 7)
[1] "heyheyheyheyheyheyhey"
although I'm more interested in the 'lengths' function. It doesn't save as much typing but I've found myself in situations where I wanted the functionality provided before

Code:
> x <- 1:10
> y <- rnorm(10)
> o <- lm(y ~ x)
> lengths(o)
coefficients     residuals       effects          rank fitted.values        assign            qr   df.residual
2            10            10             1            10             2             5             1
xlevels          call         terms         model
0             2             3             2
> sapply(o, length)
coefficients     residuals       effects          rank fitted.values        assign            qr   df.residual
2            10            10             1            10             2             5             1
xlevels          call         terms         model
0             2             3             2

#### trinker

##### ggplot2orBust
missed that one. Like it. No more sapply(x, length)

#### Dason

TIL: About using a vector to specify the path to the element in a list you want.

Code:
> mylist <- list(top = list(bottom  = "this is it"))
> mylist
$top$top$bottom [1] "this is it" > mylist[["top"]]$bottom
[1] "this is it"

> mylist[["top"]][["bottom"]]
[1] "this is it"
> mylist[[c("top", "bottom")]]
[1] "this is it"
I didn't realize the vector would extract like that. Neat.

#### bryangoodrich

##### Probably A Mammal
Interesting that the error message, to me, reinforces this approach to drilling through a list

Code:
mylist[[c("top", "foo", "bottom")]]
Error in mylist[[c("top", "foo", "bottom")]] : no such index at level 2

#### bryangoodrich

##### Probably A Mammal
TIL entirely by accident that there is a base function for removing whitespaces. Handy!

Code:
x <- "  Some text. "
x
trimws(x)
trimws(x, "l")
trimws(x, "r")
Also, if you want to capitalize words in a string, there is a simple regex for that

Code:
s <- "some text"
gsub("(^|\\s+)([a-z])", "\\1\\U\\2", tolower(s), perl = TRUE)
# Some Text
Essentially match the beginning of a sentence or (|) some number of whitespaces before some lower case letter. Then substitute with the first match (the start or whitespace), and then modify the 2nd match group with the uppercase version.

#### rogojel

##### TS Contributor
Capturing the p value from a survreg object - I need it to run a simulation to calculate sample sizes - can be done by looking at the code from survival::rint.summary.survreg.

There is a mystery line print(x$table) - turns out the summary survreg object has an internal table called "table" that contains all the coefficients and then it is only a matter of counting to find the right value. In my case : pval=summary(surv.mod)$table[11]

Last edited by a moderator:

#### Dason

Code:
> embed(1:20, 3)
[,1] [,2] [,3]
[1,]    3    2    1
[2,]    4    3    2
[3,]    5    4    3
[4,]    6    5    4
[5,]    7    6    5
[6,]    8    7    6
[7,]    9    8    7
[8,]   10    9    8
[9,]   11   10    9
[10,]   12   11   10
[11,]   13   12   11
[12,]   14   13   12
[13,]   15   14   13
[14,]   16   15   14
[15,]   17   16   15
[16,]   18   17   16
[17,]   19   18   17
[18,]   20   19   18
I pretty much use zoo::rollapply for 'rolling' type calculations if there isn't a more elegant way to do it and probably won't switch over to using embed but it's an interesting function to keep in mind.

#### bryangoodrich

##### Probably A Mammal
Interesting function. I'm not entirely sure where I would make use of it. At least it makes sense to me in the vector case. Embedding a matrix is odd. It makes sense, but it just seems complicated with how you would use the matrix version.

Code:
embed(matrix(1:12, ncol=3), 2)
#      [,1] [,2] [,3] [,4] [,5] [,6]
# [1,]    2    6   10    1    5    9
# [2,]    3    7   11    2    6   10
# [3,]    4    8   12    3    7   11

#### trinker

##### ggplot2orBust
TIL how to detect # of R gui instances running on Windows. How to generalize to other platforms:

Code:
length(grep("r(gui|studio).exe", system("tasklist", intern = TRUE), ignore.case = TRUE))

#### Dason

Are you asking how to generalize it? For Linux if imagine a system call utilizing pgrep could be useful. A modified version of that might work on a Mac.

#### jamesmartinn

##### Member
TIL:
- How to scrape data into R from the web using rvest
- Use Shiny to scrape data using rvest every 5 minutes and plot a graph of the results indexed by time stamp (i.e. data streaming?)
- Send indexed time data that is scraped to an external database (googlesheets) to log results for future analysis

#### bryangoodrich

##### Probably A Mammal
TIL:
- How to scrape data into R from the web using rvest
- Use Shiny to scrape data using rvest every 5 minutes and plot a graph of the results indexed by time stamp (i.e. data streaming?)
- Send indexed time data that is scraped to an external database (googlesheets) to log results for future analysis
Any code to share? I never used rvest. Curious to see how such a deployment works in Shiny. Makes sense, I just never seen it, and that's pretty cool for a real-time monitor.