Today I Learned: ____

Thanks!

This is so weird. I can't get a stripped version of it to work. The stripped code (using a different data set) comes down to this:

Code:
library(randomForest)
dataset <- mtcars 
model <- randomForest(mpg ~ ., data=dataset)
plotMargins <- function(fac) partialPlot(model, dataset, eval(fac))
lapply(c("disp", "wt", "gear"), plotMargins)
But it gives an error:

Code:
"Error in `[.data.frame`(pred.data, , xname) : undefined columns selected"
The problem I had before.

Basically, what I want to do: apply the partialPlot function to a list of variable names to get a selection of plots. However, partialPlot only works when I type out the names of the variables. Does that make sense?
 
Last edited:

rogojel

TS Contributor
I tried to build a step function regression model with cut like :

mod=lm(Y~cut(X,n))

Applying predict to the model fails always, because the model apparently does not store the cut points but only the info that there need to be n of them. So in the new dataset the cut points will have a different position and the prediction will fail with error "new factor values".

The trick seems to be to get the cutpoints from the cut function and then build the model explicitely, using the cut points.

Code:
  BPs=GetBPs(W.tr$age,n)
  
  mod=lm(wage~cut(age,breaks=BPs), data=W.tr)
  pred=predict(mod, newdata=list(age=W.te$age))
where the function of coaxing the list of cut points is homegrown, as the cut function does not provide this functionalty by default.

Code:
GetBPs=function(v,n){
  x=gsub("\\]","",gsub("\\(", "",levels(cut(v,n))))
  len=length(x)
  numbers=numeric(length=2*len)
  for(i in 1:len){
    s=strsplit(x[i],"\\,")
    numbers[2*i-1]=as.numeric(s[[1]][1])
    numbers[2*i]=as.numeric(s[[1]][2])
  }
  return(unique(numbers))
}
Any idea how to do this better?
 

Dason

Ambassador to the humans
@Dason learned about strrep from your post :) I will use this often
Didn't even notice that until now. I haven't paid much attention to the new functions introduced.

strrep seems to save a bit of typing though
Code:
> paste0(rep("hey", 7), collapse ="")
[1] "heyheyheyheyheyheyhey"
> strrep("hey", 7)
[1] "heyheyheyheyheyheyhey"
although I'm more interested in the 'lengths' function. It doesn't save as much typing but I've found myself in situations where I wanted the functionality provided before

Code:
> x <- 1:10
> y <- rnorm(10)
> o <- lm(y ~ x)
> lengths(o)
 coefficients     residuals       effects          rank fitted.values        assign            qr   df.residual 
            2            10            10             1            10             2             5             1 
      xlevels          call         terms         model 
            0             2             3             2 
> sapply(o, length)
 coefficients     residuals       effects          rank fitted.values        assign            qr   df.residual 
            2            10            10             1            10             2             5             1 
      xlevels          call         terms         model 
            0             2             3             2
 

Dason

Ambassador to the humans
TIL: About using a vector to specify the path to the element in a list you want.

Code:
> mylist <- list(top = list(bottom  = "this is it"))
> mylist
$top
$top$bottom
[1] "this is it"


> mylist[["top"]]
$bottom
[1] "this is it"

> mylist[["top"]][["bottom"]]
[1] "this is it"
> mylist[[c("top", "bottom")]]
[1] "this is it"
I didn't realize the vector would extract like that. Neat.
 

bryangoodrich

Probably A Mammal
Interesting that the error message, to me, reinforces this approach to drilling through a list

Code:
mylist[[c("top", "foo", "bottom")]]
Error in mylist[[c("top", "foo", "bottom")]] : no such index at level 2
 

bryangoodrich

Probably A Mammal
TIL entirely by accident that there is a base function for removing whitespaces. Handy!

Code:
x <- "  Some text. "
x
trimws(x)
trimws(x, "l")
trimws(x, "r")
Also, if you want to capitalize words in a string, there is a simple regex for that

Code:
s <- "some text"
gsub("(^|\\s+)([a-z])", "\\1\\U\\2", tolower(s), perl = TRUE) 
# Some Text
Essentially match the beginning of a sentence or (|) some number of whitespaces before some lower case letter. Then substitute with the first match (the start or whitespace), and then modify the 2nd match group with the uppercase version.
 

rogojel

TS Contributor
Capturing the p value from a survreg object - I need it to run a simulation to calculate sample sizes - can be done by looking at the code from survival:::print.summary.survreg.

There is a mystery line print(x$table) - turns out the summary survreg object has an internal table called "table" that contains all the coefficients and then it is only a matter of counting to find the right value.

In my case : pval=summary(surv.mod)$table[11]
 
Last edited by a moderator:

Dason

Ambassador to the humans
TIL: About the embed function.

Code:
> embed(1:20, 3)
      [,1] [,2] [,3]
 [1,]    3    2    1
 [2,]    4    3    2
 [3,]    5    4    3
 [4,]    6    5    4
 [5,]    7    6    5
 [6,]    8    7    6
 [7,]    9    8    7
 [8,]   10    9    8
 [9,]   11   10    9
[10,]   12   11   10
[11,]   13   12   11
[12,]   14   13   12
[13,]   15   14   13
[14,]   16   15   14
[15,]   17   16   15
[16,]   18   17   16
[17,]   19   18   17
[18,]   20   19   18
I pretty much use zoo::rollapply for 'rolling' type calculations if there isn't a more elegant way to do it and probably won't switch over to using embed but it's an interesting function to keep in mind.
 

bryangoodrich

Probably A Mammal
Interesting function. I'm not entirely sure where I would make use of it. At least it makes sense to me in the vector case. Embedding a matrix is odd. It makes sense, but it just seems complicated with how you would use the matrix version.

Code:
embed(matrix(1:12, ncol=3), 2)
#      [,1] [,2] [,3] [,4] [,5] [,6]
# [1,]    2    6   10    1    5    9
# [2,]    3    7   11    2    6   10
# [3,]    4    8   12    3    7   11
 

trinker

ggplot2orBust
TIL how to detect # of R gui instances running on Windows. How to generalize to other platforms:

Code:
length(grep("r(gui|studio).exe", system("tasklist", intern = TRUE), ignore.case = TRUE))
 

Dason

Ambassador to the humans
Are you asking how to generalize it? For Linux if imagine a system call utilizing pgrep could be useful. A modified version of that might work on a Mac.
 
TIL:
- How to scrape data into R from the web using rvest
- Use Shiny to scrape data using rvest every 5 minutes and plot a graph of the results indexed by time stamp (i.e. data streaming?)
- Send indexed time data that is scraped to an external database (googlesheets) to log results for future analysis
 

bryangoodrich

Probably A Mammal
TIL:
- How to scrape data into R from the web using rvest
- Use Shiny to scrape data using rvest every 5 minutes and plot a graph of the results indexed by time stamp (i.e. data streaming?)
- Send indexed time data that is scraped to an external database (googlesheets) to log results for future analysis
Any code to share? I never used rvest. Curious to see how such a deployment works in Shiny. Makes sense, I just never seen it, and that's pretty cool for a real-time monitor.