TIL Scoping rules:

Will g(10) return 30 or 50? Why?

If you can't answer why here's a great video: http://www.youtube.com/watch?list=P...jPkng6B&v=WCu1Xw4h_n8&feature=player_embedded

Will g(10) return 30 or 50? Why?

Code:

```
y <- 3
g <- function(x) {
x * y
}
f <- function(z) {
y <- 5
g(z)
}
g(10)
```

...

And I was right. This boils down to understanding what environment is getting pulled from when you're referencing variables.

Code:

```
library(sos)
findFn("hosmer lemeshow")
```

Code:

`findFn("Dason")`

call: findFn(string = "Dason")

Id Count MaxScore TotalScore Package Function Date Score Description and Link

1 2 1 2 dbConnect DatabaseConnect 2012-02-22 19:29:49 1 Connect to a database and view it...

2 2 1 2 dbConnect sqlToR 2012-02-22 19:29:49 1 Convert data types from SQL to R...

Was reading the headline article on New England Journal and saw a cool graphics where a plot was embedded inside another plot

Figure 2 from http://www.nejm.org/doi/full/10.1056/NEJMoa1211801?query=featured_home

Did a bit of research and learnt how to do it using viewport

Code:

```
## General layout
pushViewport(viewport())
[COLOR="red"]code.to.plot[/COLOR]
pushViewport(viewport(x=0.85,y=.70,width=.6,height=.4,just=c("right","top"))) ## customise
grid.rect()
par(plt = gridPLT(), new=TRUE)
[COLOR="red"]code.to.plot[/COLOR]
```

Code:

```
library(lattice);library(gridBase);library(survival);
# Main Plot
pushViewport(viewport())
fit <- survfit(Surv(time, status) ~ 1, data = aml) ## generate K-M
plot(fit,conf.int=FALSE,ylim=c(0,1),xlim=c(0,9),
ylab=" Survival", xlab="Days", main="K-M Estimates", col="darkred")
# Embed a plot in an existing plot using viewport
pushViewport(viewport(x=0.85,y=.70,width=.6,height=.4,just=c("right","top"))) ## custom
grid.rect()
par(plt = gridPLT(), new=TRUE)
plot(fit, ylim=c(0.8,1),xlim=c(0,9),conf.int=FALSE,col="darkred")
```

Code:

```
cdown <- function(x, temp = vector(mode='integer')){
if(x >= 0){
temp <- append(temp, x)
return(cdown(x = x-1, temp))
}
temp
}
cdown(10)
# [1] 10 9 8 7 6 5 4 3 2 1 0
```

Code:

```
> fact <- function(n){if(n<=1) 1 else n*fact(n-1)}
> fact(5)
[1] 120
> newfact <- fact
> rm(fact)
> newfact(5)
Error in newfact(5) : could not find function "fact"
> fact <- function(n){if(n<=1) 1 else n*Recall(n-1)}
> fact(5)
[1] 120
> newfact <- fact
> rm(fact)
> newfact(5)
[1] 120
```

Code:

```
> set.seed(1)
> library(ggplot2)
> rnorm(3)
[1] -0.3262334 1.3297993 1.2724293
> set.seed(1)
> rnorm(3)
[1] -0.6264538 0.1836433 -0.8356286
> set.seed(1)
> rnorm(3)
[1] -0.6264538 0.1836433 -0.8356286
```

Code:

```
> set.seed(1)
> library(ggplot2)
> rnorm(3)
[1] -0.3262334 1.3297993 1.2724293
> set.seed(1)
> rnorm(3)
[1] -0.6264538 0.1836433 -0.8356286
> set.seed(1)
> rnorm(3)
[1] -0.6264538 0.1836433 -0.8356286
```

That is actually a pretty decent way to do that. It's much more naughty to do this:

Code:

```
fun <- function(x) {
cat("Hi")
x + 5
}
cat<-function(X){}
f(3)
```

Even though in Correspondence Analysis (CA) exploratory stage and testing for significance are two different sides of the coin (at least from the French/European scholarly tradition tracing back to the early work of Benzecri), I habe been trying to calculate the significance of the CA's dimensions following the procedure described by

Fortunately enough, I came accross an R package, named 'InPosition', which has finally allowed me to put Greenacre's method to work. Given a crosstabulation, the package is able to calculate the dimensions, as well as the total inertia, of n simulated (i.e., permutated) tables.

In what follows, I will describe:

-how to plot the permutated explained inertia of CA's first two dimensions and calculate their significance (Steps 1-3)

-how plot and calcultate the significance of the CA total inertia (Steps 4-5).

For comparability's sake, I will use the dataset used by Greenacre (p. 199, Exhibit 25.4). It is shown below:

Code:

```
C1 C2 C3
E1 5 7 2
E2 18 46 20
E3 19 29 39
E4 12 40 49
E5 3 7 16
```

Code:

```
[I]#load the InPosition package[/I]
library(InPosition)
[I]#read the table selecting the file saved on the PC[/I]
dat <- read.table(file=file.choose(), header=TRUE)
[I]# use the apposite command to simulate [B]999 (permutated) contingency tables[/B]. It takes about 15 secs[/I]
res<-epCA.inference.battery(dat, test.iters=999)
```

Code:

```
[I]#plot a scatterplot of the permutated inertia of the first two dimensions[/I]
plot(res$Inference.Data$components$eigs.perm[,1], res$Inference.Data$components$eigs.perm[,2], main=" Scatterplot of permutated Dimensions' inertia", sub="solid line: obs. inertia; dashed line: 95 percentile of the permut. distrib. (=alpha 0.05 threshold)", xlab="inertia of permutated 1 Dim", ylab="inertia of permutated 2 Dim")
[I]##add reference lines[/I]
[I]#reference lines indicating the 95 percentile of the permutated distribution of the first and second dimensions' inertia[/I]
abline(v=quantile(res$Inference.Data$components$eigs.perm[,1], c(0.95)), lty=2, col="blue")
abline(h=quantile(res$Inference.Data$components$eigs.perm[,2], c(0.95)), lty=2, col="blue")
[I]#reference lines indicating the observed inertia of the first two dimensions[/I]
abline(v=res$Inference.Data$components$eigs[1])
abline(h=res$Inference.Data$components$eigs[2])
```

Step 3 (calculating the significance of the 1 and 2 dimensions)

Code:

```
[I]#count the number of permutated eigenvalues of the first dimensions that are greater than the observed eigenvalue of the same dimension[/I]
first.dim.ratio<-length(which(res$Inference.Data$components$eigs.perm[,1] > res$Inference.Data$components$eigs[1]))
[I]#compute the p value of the first dimension (p=0.003) [/I]
pvalue.first.dim<-first.dim.ratio/1000
[I]#the same applies for the second dimension (p=0.07)[/I]
second.dim.ratio<-length(which(res$Inference.Data$components$eigs.perm[,2] > res$Inference.Data$components$eigs[2]))
pvalue.second.dim<-second.dim.ratio/1000
```

Code:

```
[I]#plot the distribution of the permutated total inertia[/I]
d <- density(res$Inference.Data$omni$inertia.perm)
plot(d, main="Kernel density of CA permutated total inertia", sub="solid line: obs. inertia; dashed line: 95 percentile of the permut. distrib. (=alpha 0.05 threshold)")
polygon(d, col="red", border="blue")
[I]##add reference lines[/I]
[I]#add reference line indicating the observed total inertia[/I]
abline(v=sum(res$Fixed.Data$ExPosition.Data$eigs))
[I]#add reference line indicating the 95 percentile of the permutated total inertia[/I]
abline(v=quantile(res$Inference.Data$omni$inertia.perm, c(0.95)), lty=2, col="blue")
```

Step 5 (calculating the significance of the the total inertia)

Code:

```
[I]##calculate the significance of the total inertia[/I]
[I]#count the number of time the permutated total inertia is greater than the observed total inertia[/I]
inertia.ratio<-length(which(res$Inference.Data$omni$inertia.perm> sum(res$Fixed.Data$ExPosition.Data$eigs)))
[I]#compute the significance of the observed total inertia (p=0.002)[/I]
pvalue.inertia<-inertia.ratio/1000
```

Final notes:

-for the time being, I use the basic plotting facility of R.

-InPosition natively returns the significance of the dimensions and of the total inertia. However, I am not particularly happy with it since the way in which those figures are calculated is not obvious and is not explained in the accompanying documentation. So, I believe that tackling the issue from the long way as done above can be more useful, for one thing, from a didactical standpoint.

Hope this helps.

Regards

Gm