Today I Learned: ____

vinux

Dark Knight
Logical. Never tried to hack R in this way.
Thanks Trinker

TIL what it means for R to return a function:

Code:
make.power <- function(n) {
    pow <- function(x) {
        x^n
    }
    pow
}

sqrd <- make.power(2)
sqrd(3)
cubd <- make.power(3)
cubd(3)
Not sure yet how I'd use this but cool none the less.
 

Dason

Ambassador to the humans
TIL Scoping rules:

Will g(10) return 30 or 50? Why?

Code:
y <- 3
g <- function(x) {
    x * y
}

f <- function(z) {
    y <- 5
    g(z)
}

g(10)
If you can't answer why here's a great video: http://www.youtube.com/watch?list=P...jPkng6B&v=WCu1Xw4h_n8&feature=player_embedded
I'm going with 30. But then again I think you meant for it to be more tricky and call f(10) instead. Either way it should be 30.

...

And I was right. This boils down to understanding what environment is getting pulled from when you're referencing variables.
 

ledzep

Point Mass at Zero
Pretty cool. I was looking for Hosmer Lemeshow test for logistic regression and didn't knew if it exisited..Here's what I did..

Code:
library(sos)
findFn("hosmer lemeshow")
Ever wonder what Dason's been upto lately...You can find him here :)
Code:
findFn("Dason")
## Shows two functions from dbConnect package..

call: findFn(string = "Dason")
Id Count MaxScore TotalScore Package Function Date Score Description and Link
1 2 1 2 dbConnect DatabaseConnect 2012-02-22 19:29:49 1 Connect to a database and view it...
2 2 1 2 dbConnect sqlToR 2012-02-22 19:29:49 1 Convert data types from SQL to R...
 
I'm using a mac with rstudio. Got the wrap function working but got humbugged on the unwrap. The wrap just needed some minor changes. Don't forget to close() the connection.
 

ledzep

Point Mass at Zero
TIL: Viewports and how to embed a plot inside an existing plot

Was reading the headline article on New England Journal and saw a cool graphics where a plot was embedded inside another plot
Figure 2 from http://www.nejm.org/doi/full/10.1056/NEJMoa1211801?query=featured_home

Did a bit of research and learnt how to do it using viewport

Code:
## General layout
pushViewport(viewport())
 [COLOR="red"]code.to.plot[/COLOR]
pushViewport(viewport(x=0.85,y=.70,width=.6,height=.4,just=c("right","top"))) ## customise
grid.rect()
par(plt = gridPLT(), new=TRUE)
 [COLOR="red"]code.to.plot[/COLOR]
Tried my hands on some inbuilt dataset in R
Code:
library(lattice);library(gridBase);library(survival);

# Main Plot
pushViewport(viewport())
fit <- survfit(Surv(time, status) ~ 1, data = aml)  ## generate K-M
plot(fit,conf.int=FALSE,ylim=c(0,1),xlim=c(0,9),
    ylab=" Survival", xlab="Days", main="K-M Estimates", col="darkred")

# Embed a plot in an existing plot using viewport
pushViewport(viewport(x=0.85,y=.70,width=.6,height=.4,just=c("right","top"))) ## custom
grid.rect()
par(plt = gridPLT(), new=TRUE)
plot(fit, ylim=c(0.8,1),xlim=c(0,9),conf.int=FALSE,col="darkred")
Pretty Cool :)
 
Today I learned about self-calling functions. There can be some very interesting applications of this property, but for brevity, I'll show this countdown example. (I know full well this is done much easier by other methods.)

Code:
cdown <- function(x, temp = vector(mode='integer')){
	if(x >= 0){
		temp <- append(temp, x)
		return(cdown(x = x-1, temp))
	}
	temp
}

cdown(10)

#  [1] 10  9  8  7  6  5  4  3  2  1  0
 

Dason

Ambassador to the humans
So you finally learned about recursion eh? It doesn't get used a lot in R (for good reason) but it can be useful. I'll warn you that if you do want to use recursion it can be safer to use the 'Recall' function to refer to the calling function inside of the function. This allows you to rename your function without worry. Here's an example

Code:
> fact <- function(n){if(n<=1) 1 else n*fact(n-1)}
> fact(5)
[1] 120
> newfact <- fact
> rm(fact)
> newfact(5)
Error in newfact(5) : could not find function "fact"
> fact <- function(n){if(n<=1) 1 else n*Recall(n-1)}
> fact(5)
[1] 120
> newfact <- fact
> rm(fact)
> newfact(5)
[1] 120
In the first example since we renamed fact to newfact and then removed fact itself - inside the function definition it doesn't know where to find fact anymore! But in the second example since we used Recall to refer to the function that we are currently inside we don't need to worry about the name of the function itself and we're perfectly safe renaming the function.
 

Dason

Ambassador to the humans
TIL: ggplot2 calls the "sample" function in it's .onAttach() method. What does this mean for the user? Make sure that if you want reproducible results that you load ggplot2 before making your call to set.seed.

Code:
> set.seed(1)
> library(ggplot2)
> rnorm(3)
[1] -0.3262334  1.3297993  1.2724293
> set.seed(1)
> rnorm(3)
[1] -0.6264538  0.1836433 -0.8356286
> set.seed(1)
> rnorm(3)
[1] -0.6264538  0.1836433 -0.8356286
 

Dason

Ambassador to the humans
Would this effect jitter?
What do you mean? It mainly just makes a difference if you put the set.seed call before loading ggplot2 and then you also rerun the script (because it wouldn't reattach ggplot2 so any calls that require random outputs would be slightly different.
 

TheEcologist

Global Moderator
TIL: ggplot2 calls the "sample" function in it's .onAttach() method. What does this mean for the user? Make sure that if you want reproducible results that you load ggplot2 before making your call to set.seed.

Code:
> set.seed(1)
> library(ggplot2)
> rnorm(3)
[1] -0.3262334  1.3297993  1.2724293
> set.seed(1)
> rnorm(3)
[1] -0.6264538  0.1836433 -0.8356286
> set.seed(1)
> rnorm(3)
[1] -0.6264538  0.1836433 -0.8356286
I have a better solution to this problem, a final solution or an endlosung, pm me if your interested in more a permanent solution!
 

Dason

Ambassador to the humans
That is actually a pretty decent way to do that. It's much more naughty to do this:

Code:
fun <- function(x) {
    cat("Hi")
    x + 5
}

 cat<-function(X){}

f(3)
Especially since that will only work for functions that you have defined. Any functions from packages or in base R will use base::cat and not your version so redefining cat won't work in most situations where you want output generated by cat to go away.
 

gianmarco

TS Contributor
Re:Today I Learned: calculate the significance of Correspondence Analysis' dimensions

Even though in Correspondence Analysis (CA) exploratory stage and testing for significance are two different sides of the coin (at least from the French/European scholarly tradition tracing back to the early work of Benzecri), I habe been trying to calculate the significance of the CA's dimensions following the procedure described by Greenacre M., Correspondence Analysis in Practice (2007) (Amazon link) (pages. 198-199).

Fortunately enough, I came accross an R package, named 'InPosition', which has finally allowed me to put Greenacre's method to work. Given a crosstabulation, the package is able to calculate the dimensions, as well as the total inertia, of n simulated (i.e., permutated) tables.

In what follows, I will describe:
-how to plot the permutated explained inertia of CA's first two dimensions and calculate their significance (Steps 1-3)
-how plot and calcultate the significance of the CA total inertia (Steps 4-5).

For comparability's sake, I will use the dataset used by Greenacre (p. 199, Exhibit 25.4). It is shown below:
Code:
	C1	C2	C3
E1	5	7	2
E2	18	46	20
E3	19	29	39
E4	12	40	49
E5	3	7	16
Step 1 (entering the data)
Code:
[I]#load the InPosition package[/I]
library(InPosition)

[I]#read the table selecting the file saved on the PC[/I]
dat <- read.table(file=file.choose(), header=TRUE)

[I]# use the apposite command to simulate [B]999 (permutated) contingency tables[/B]. It takes about 15 secs[/I]
res<-epCA.inference.battery(dat, test.iters=999)
Step 2 (plotting permutated inertia of dimensions' 1 and 2)
Code:
[I]#plot a scatterplot of the permutated inertia of the first two dimensions[/I]
plot(res$Inference.Data$components$eigs.perm[,1], res$Inference.Data$components$eigs.perm[,2], main=" Scatterplot of permutated Dimensions' inertia", sub="solid line: obs. inertia; dashed line: 95 percentile of the permut. distrib. (=alpha 0.05 threshold)", xlab="inertia of permutated 1 Dim", ylab="inertia of permutated 2 Dim")

[I]##add reference lines[/I]
[I]#reference lines indicating the 95 percentile of the permutated distribution of the first and second dimensions' inertia[/I]
abline(v=quantile(res$Inference.Data$components$eigs.perm[,1], c(0.95)), lty=2, col="blue")
abline(h=quantile(res$Inference.Data$components$eigs.perm[,2], c(0.95)), lty=2, col="blue")

[I]#reference lines indicating the observed inertia of the first two dimensions[/I]
abline(v=res$Inference.Data$components$eigs[1])
abline(h=res$Inference.Data$components$eigs[2])
The above code produces the attached Plot_n1. The dashed references lines indicate the threshold of the 95 percentile of the permutated distribution of the first two dimensions' inertia. Observed inertia lying beyond that threshold is signifcant at alpha 0.05

Step 3 (calculating the significance of the 1 and 2 dimensions)
Code:
[I]#count the number of permutated eigenvalues of the first dimensions that are greater than the observed eigenvalue of the same dimension[/I]
first.dim.ratio<-length(which(res$Inference.Data$components$eigs.perm[,1] > res$Inference.Data$components$eigs[1])) 

[I]#compute the p value of the first dimension (p=0.003) [/I]
pvalue.first.dim<-first.dim.ratio/1000 

[I]#the same applies for the second dimension (p=0.07)[/I]
second.dim.ratio<-length(which(res$Inference.Data$components$eigs.perm[,2] > res$Inference.Data$components$eigs[2])) 
pvalue.second.dim<-second.dim.ratio/1000
Step 4 (plotting the distribution of the permutated total inertia)
Code:
[I]#plot the distribution of the permutated total inertia[/I]
d <- density(res$Inference.Data$omni$inertia.perm)
plot(d, main="Kernel density of CA permutated total inertia", sub="solid line: obs. inertia; dashed line: 95 percentile of the permut. distrib. (=alpha 0.05 threshold)")
polygon(d, col="red", border="blue")

[I]##add reference lines[/I]
[I]#add reference line indicating the observed total inertia[/I]
abline(v=sum(res$Fixed.Data$ExPosition.Data$eigs))

[I]#add reference line indicating the 95 percentile of the permutated total inertia[/I]
abline(v=quantile(res$Inference.Data$omni$inertia.perm, c(0.95)), lty=2, col="blue")
The above code produces the attached Plot_n2.

Step 5 (calculating the significance of the the total inertia)
Code:
[I]##calculate the significance of the total inertia[/I]
[I]#count the number of time the permutated total inertia is greater than the observed total inertia[/I]
inertia.ratio<-length(which(res$Inference.Data$omni$inertia.perm> sum(res$Fixed.Data$ExPosition.Data$eigs)))

[I]#compute the significance of the observed total inertia (p=0.002)[/I]
pvalue.inertia<-inertia.ratio/1000

Final notes:
-for the time being, I use the basic plotting facility of R.
-InPosition natively returns the significance of the dimensions and of the total inertia. However, I am not particularly happy with it since the way in which those figures are calculated is not obvious and is not explained in the accompanying documentation. So, I believe that tackling the issue from the long way as done above can be more useful, for one thing, from a didactical standpoint.

Hope this helps.
Regards
Gm
 

trinker

ggplot2orBust
TIL an alternate way to force a list into a data.frame object using I as seen here:

Code:
lst <- list(a=1:5, b=letters[1:3])
dat2 <- dat <- data.frame(x = 1:2, y=rnorm(2))

[COLOR="#696969"]## My old way[/COLOR]
dat2$new <- lst
dat2

[COLOR="#696969"]## New way[/COLOR]
data.frame(dat, new = I(lst))