Code:

```
with(warpbreaks, data.frame(tension, breaks,
mean_breaks=ave(breaks, tension, FUN=sd)))
```

Code:

```
tension breaks mean_breaks
1 L 26 16.446487
2 L 30 16.446487
3 L 54 16.446487
4 L 25 16.446487
5 L 70 16.446487
6 L 52 16.446487
7 L 51 16.446487
8 L 26 16.446487
9 L 67 16.446487
10 M 18 9.121009
11 M 21 9.121009
12 M 29 9.121009
13 M 17 9.121009
14 M 12 9.121009
15 M 18 9.121009
16 M 35 9.121009
17 M 30 9.121009
18 M 36 9.121009
19 H 36 8.352527
20 H 21 8.352527
21 H 24 8.352527
22 H 18 8.352527
23 H 10 8.352527
24 H 43 8.352527
25 H 28 8.352527
26 H 15 8.352527
27 H 26 8.352527
28 L 27 16.446487
29 L 14 16.446487
30 L 29 16.446487
31 L 19 16.446487
32 L 29 16.446487
33 L 31 16.446487
34 L 41 16.446487
35 L 20 16.446487
36 L 44 16.446487
37 M 42 9.121009
38 M 26 9.121009
39 M 19 9.121009
40 M 16 9.121009
41 M 39 9.121009
42 M 28 9.121009
43 M 21 9.121009
44 M 39 9.121009
45 M 29 9.121009
46 H 20 8.352527
47 H 21 8.352527
48 H 24 8.352527
49 H 17 8.352527
50 H 13 8.352527
51 H 15 8.352527
52 H 15 8.352527
53 H 16 8.352527
54 H 28 8.352527
```

Code:

`merge(warpbreaks, aggregate(breaks ~ tension, warpbreaks, sd), by = "tension")`

Code:

```
tension breaks.x wool breaks.y
1 H 21 A 8.352527
2 H 24 A 8.352527
3 H 18 A 8.352527
4 H 10 A 8.352527
5 H 43 A 8.352527
6 H 28 A 8.352527
7 H 15 A 8.352527
8 H 26 A 8.352527
9 H 36 A 8.352527
10 H 20 B 8.352527
11 H 21 B 8.352527
12 H 24 B 8.352527
13 H 17 B 8.352527
14 H 13 B 8.352527
15 H 15 B 8.352527
16 H 15 B 8.352527
17 H 16 B 8.352527
18 H 28 B 8.352527
19 L 26 A 16.446487
20 L 30 A 16.446487
21 L 54 A 16.446487
22 L 25 A 16.446487
23 L 70 A 16.446487
24 L 52 A 16.446487
25 L 51 A 16.446487
26 L 26 A 16.446487
27 L 67 A 16.446487
28 L 44 B 16.446487
29 L 27 B 16.446487
30 L 14 B 16.446487
31 L 29 B 16.446487
32 L 19 B 16.446487
33 L 29 B 16.446487
34 L 31 B 16.446487
35 L 41 B 16.446487
36 L 20 B 16.446487
37 M 21 A 9.121009
38 M 29 A 9.121009
39 M 17 A 9.121009
40 M 12 A 9.121009
41 M 18 A 9.121009
42 M 42 B 9.121009
43 M 26 B 9.121009
44 M 19 B 9.121009
45 M 16 B 9.121009
46 M 18 A 9.121009
47 M 35 A 9.121009
48 M 30 A 9.121009
49 M 36 A 9.121009
50 M 29 B 9.121009
51 M 39 B 9.121009
52 M 28 B 9.121009
53 M 21 B 9.121009
54 M 39 B 9.121009
```

I was revising my ALSM project. Chapter 8 deals with qualitative predictors. The approach I used previously was terrible, largely due to my ignorance 2+ years ago. I wanted to know if there's a better way to color code and plot regression lines at different factor levels. There is!

The color coding is ridiculously easy. Previously I did two separate plots by subsetting and coloring each point overlay. That is as cumbersome as it sounds! Solution:

Code:

`plot(Y ~ X1, df, col = X2)`

Now, when it came to drawing the two lines for each level, I turned to

Code:

```
coefs = coef(fit)
abline(coefs[1], coefs[2]); abline(coefs[1] + coefs[3], coefs[2])
```

I wanted to know if there is a more direct way to handle this. I stumbled across this blog on a quick Google search.

The alternative to my approach above is to fit separate regressions for each of the data subsets. Then I can just do abline on each of those fitted models. It's much easier, but requires more models. Then it finishes with a way to do both models at once!

Code:

`lm(Y ~ X2/X1 - 1, df)`

Code:

```
# Populate the required data
df <- structure(list(Y = c(218L, 248L, 360L, 351L, 470L, 394L, 332L,
321L, 410L, 260L, 241L, 331L, 275L, 425L, 367L, 140L, 277L, 384L,
341L, 215L, 180L, 260L, 361L, 252L, 422L, 273L, 410L), X1 = c(100L,
125L, 220L, 205L, 300L, 255L, 225L, 175L, 270L, 170L, 155L, 190L,
140L, 290L, 265L, 105L, 215L, 270L, 255L, 175L, 135L, 200L, 275L,
155L, 320L, 190L, 295L), X2 = structure(c(2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Line1", "Line2"), class = "factor")), .Names = c("Y",
"X1", "X2"), row.names = c(NA, -27L), class = "data.frame")
# Define the models
fit <- lm(Y ~ X1 * X2, df)
fit1 <- lm(Y ~ X1, subset(df, X2 == "Line1"))
fit2 <- lm(Y ~ X1, subset(df, X2 == "Line2"))
fitt <- lm(Y ~ X2/X1 -1, df)
par(mfrow = c(2,2))
# Subset Models Approach
plot (Y ~ X1, subset(df, X2 == "Line1"), pch = 19, xlim=c(100, 350), ylim=c(100, 500)); abline(fit1)
points(Y ~ X1, subset(df, X2 == "Line2"), col = "red", pch = 19); abline(fit2, col = "red")
# Manual Approach
plot(Y ~ X1, df, col = X2, pch = 19, xlim=c(100, 350), ylim = c(100, 500))
abline(coef(fit)[[1]], coef(fit)[[2]])
abline(coef(fit)[[1]] + coef(fit)[[3]], coef(fit)[[2]] + coef(fit)[[4]], col = "red")
# New Approach
plot(Y ~ X1, df, col = X2, pch = 19, xlim=c(100, 350), ylim = c(100, 500))
abline(coef(fitt)[c(T, F)]) # I could be explicit here, but odds and
abline(coef(fitt)[c(F, T)], col = "red") # even refer to the 2 different models
```

Code:

```
model.matrix(fitt)
X2Line1 X2Line2 X2Line1:X1 X2Line2:X1
1 0 1 0 100
2 0 1 0 125
3 0 1 0 220
4 0 1 0 205
5 0 1 0 300
6 0 1 0 255
7 0 1 0 225
8 0 1 0 175
9 0 1 0 270
10 0 1 0 170
11 0 1 0 155
12 0 1 0 190
13 0 1 0 140
14 0 1 0 290
15 0 1 0 265
16 1 0 105 0
17 1 0 215 0
18 1 0 270 0
19 1 0 255 0
20 1 0 175 0
21 1 0 135 0
22 1 0 200 0
23 1 0 275 0
24 1 0 155 0
25 1 0 320 0
26 1 0 190 0
27 1 0 295 0
attr(,"assign")
[1] 1 1 2 2
attr(,"contrasts")
attr(,"contrasts")$X2
[1] "contr.treatment"
```

Code:

```
fit <- lm(Y ~ X1*X2 - 1, df)
coefs(fit)
# X1 X2Line1 X2Line2 X1:X2Line2
# 1.3220488 7.5744646 97.9653278 -0.1766614
```

Is there a way to recreate the Y ~ X2/X1 -1 model with the formula notation I'm familiar with? It seems like I need a way to specify "remove the intercept term" for the interaction X1:X2 analogous to how we do it with X2 alone--i.e., if the intercept were in there, then X2Line1 would just be the intercept value and X2Line2 would be that intercept amount less. So we remove the intercept so each factor level has a represented mean value in this model. The difference in coefficients I'm observing between the interaction model I defined above and the X2/X1 notation seems to be analogous in some ways to this. The X1 value is basically the "X2Line1:X1" and I want an X2Line2:X1. Instead, the interaction model has X1:X2Line2, which shows how much difference we have at the Line2 level from X1 at the Line1 level. Make sense? So in my mind, I'm perceiving this X2/X1 notation to basically be that "remove the intercept term" sort of maneuver. It lets the coefficients for those variables be fully represented, not represented as their difference (requiring summation). Make sense?

So my guess is there is no way to recreate the model using the +, *, or : notations I know you can use in formulas?

Code:

```
lm(Y ~ 0 + X2 + X2:X1, df)
# Call:
# lm(formula = Y ~ 0 + X2 + X2:X1, data = df)
#
# Coefficients:
# X2Line1 X2Line2 X2Line1:X1 X2Line2:X1
# 7.574 97.965 1.322 1.145
```

Writing out the details of this model it makes sense how it works as containing the two separate models Y ~ X1 when X2 is Line1 and Y ~ X1 when X2 is Line2 (having different intercepts). The full expression is

\(Y = b_0 X_{2,1} + b_1 X_{2,2} + b_2 X_{2,1}X_1 + b_3 X_{2,2}X_1\)

So when \(X_2 = 1\) we get

\(Y = b_0 + b_2 X_1\)

and when \(X_2 = 2\) we get

\(Y = b_1 + b_3 X_1\)

The coefficients on the separate X2 levels are the separate intercept terms, and depending on which factor there is will determine which corresponding coefficient will remain on the appropriate X1. Neat.

Typically this wouldn't cause any shock because R works copies but at some point the assignment needs to take place. For example:

Code:

```
> x <- 1:10
> "[<-"(x, 2, 0)
[1] 1 0 3 4 5 6 7 8 9 10
> x # no change...
[1] 1 2 3 4 5 6 7 8 9 10
> # I was expecting the same results as:
> x[2] <- 0
> x
[1] 1 0 3 4 5 6 7 8 9 10
> x <- 1:10
> # This works though...
> "<-"("["(x, 2), 0)
> x
[1] 1 0 3 4 5 6 7 8 9 10
> # But I'm guessing this is what's actually going
> # on when you do something like x[2] <- 0
> x <- 1:10
> "<-"(x, "[<-"(x, 2, 0))
> x
[1] 1 0 3 4 5 6 7 8 9 10
```

If we think of assignment as A and index as B, then the notation makes perfect sense

A(B(x, 2), 0)

We index 'x' by 2. Thus, by function composition we pass that to the assignment statement, assigning that index of x the value of 0.

My guess is that "[<-" was sort of like calling a nested function. While the behavior took place for you to see, it was within the nested scope, and so nothing actually changed.

This made me think of trying "[<<-", to see if it would break out of the nested reference, but it says the function doesn't exist. So then "[<-" is actually its own function?? Still odd.

Code:

```
get("[<-")
get("[<-.data.frame")
```

Code:

```
> x <- 2
> get("class<-")(x, "hey")
[1] 2
attr(,"class")
[1] "hey"
> x
[1] 2
attr(,"class")
[1] "hey"
> "<-"(x, get("class<-")(x, "hey"))
> x
[1] 2
attr(,"class")
[1] "hey"
```

Code:

```
(x = seq(10))
# [1] 1 2 3 4 5 6 7 8 9 10
"[<-"(x,2,0) # No change
# [1] 1 0 3 4 5 6 7 8 9 10
x
# [1] 1 2 3 4 5 6 7 8 9 10
x[3] <- 0
x
# [1] 1 2 0 4 5 6 7 8 9 10
"[<-"(x,2,0) # Change!
# [1] 1 0 0 4 5 6 7 8 9 10
x
# [1] 1 0 0 4 5 6 7 8 9 10
```

TIL how to scrap the web on my own.

When i say scrape I mean go to multiple pages and extract information.

I pulled emoticons from a website. here's on page from that site: http://www.lingo2word.com/lists/emoticon_listH.html

This is a baby example as I'm thinking of including an emoticon data set with my qdap package. It's a baby example (as everything is straight forward and doesn't require the use of RCurl) and I'm sure BG and Dason could whip something up even better but it was my first crack at it.

When i say scrape I mean go to multiple pages and extract information.

I pulled emoticons from a website. here's on page from that site: http://www.lingo2word.com/lists/emoticon_listH.html

This is a baby example as I'm thinking of including an emoticon data set with my qdap package. It's a baby example (as everything is straight forward and doesn't require the use of RCurl) and I'm sure BG and Dason could whip something up even better but it was my first crack at it.

Code:

```
y <- readLines(url("http://www.lingo2word.com/lists/")) [COLOR="gray"]#get the list of all the site's web pages[/COLOR]
y <- strsplit(y[-c(1:8, 942:944)], '"')[COLOR="gray"]#eliminate rows that are nothing and split on "[/COLOR]
y <- do.call(rbind, y) [COLOR="gray"]#bind that stuff up[/COLOR]
a <- "emoticon"
b <- y[, 2][substring(y[, 2], 1, nchar(a))==a] [COLOR="gray"]#get only the emoticon list[/COLOR]
[COLOR="gray"]#b is the ending to all urls[/COLOR]
require(XML)
urla <- "http://www.lingo2word.com/lists/" [COLOR="gray"]#begining of all the urls[/COLOR]
z <- lapply(1:length(b), function(i) { [COLOR="gray"]#scrape that data[/COLOR]
readHTMLTable(paste0(urla, b[i]), which=2)
}
)
z2 <- lapply(z, function(x) x[, c(2, 4)]) [COLOR="gray"]#take only the 2nd and 4th item[/COLOR]
d <- do.call(rbind, z2) [COLOR="gray"]#bind the dataframe[/COLOR]
rownames(d) <- NULL [COLOR="gray"]#rename sequentialy[/COLOR]
names(d) <-c("emoticon", "meaning") [COLOR="gray"]#rename columns[/COLOR]
d[, 1] <- gsub("^\\s+", "", d[, 1]) [COLOR="gray"]#find excessive white space[/COLOR]
d[d[, 1]=="", 1] <- NA [COLOR="gray"]#if a cell is empty give it an NA[/COLOR]
d <- na.omit(d) [COLOR="gray"]#Die missing values Die[/COLOR]
d
```

Last edited:

Bioconductor:

Packages must satisfy the following checklist:

Pass R CMD build and R CMD check on all supported platforms (Windows, Macintosh, Linux) with no errors or warnings, using a recent R-devel. The result of R CMD build must be less than 2MB; R CMD check must complete within 5 minutes.

Contain a DESCRIPTION file with valid contact information, an informative title and description, correct license specification, appropriate biocViews terms, valid version number.

Contain a NAMESPACE that imports all symbols used in the package, and exports just those symbols the package author identifies as appropriate. Use of a NAMESPACE implies that appropriate packages are mentioned in the Imports: field of the DESCRIPTION file.

Contain a Sweave-style vignette that illustrates the major uses of the package. The vignette must be evaluated during package installation; a static vignette (e.g., PDF) is not acceptable.

Contain complete help pages. This includes accurate description of function parameter and return values, and meaningful examples.

Make use of appropriate existing packages (e.g., biomaRt, AnnotationDbi, Biostrings) and classes (e.g., ExpressionSet, AnnotatedDataFrame, RangedData, Rle, DNAStringSet) to avoid duplication of functionality available in other Bioconductor packages.

Contain no extraneous files (e.g., '.DS_Store', '.project', '.svn', etc.), files with invalid names (e.g., differing only in case), or code that cannot be distributed under the license specified by the author.

Packages must not already be available on CRAN.

Packages should have a descriptive name that is not already in use. See if it is by running biocLite("myPackageName"). You cannot have a package name that is case-insensitively equal to an existing package name in CRAN or Bioconductor.

Packages should also conform to the following:

Use S4 classes and methods.

Include an inst/NEWS file for providing users with information on package updates.

Authors receive initial feedback in 1 to 3 weeks. Packages will be checked for adherence to the Bioconductor package guidelines and the checklist below by a member of Bioconductor team. Package developers should consult the full Bioconductor Package Guidelines. Submission implies commitment to package maintenance across multiple release cycles.

And this is essentially what CRAN requires:

Does it pass R CMD check? K - you're good.

Code:

```
A = data.frame( a = c(1:10), b = c(11:20) )
B = data.frame( a = c(101:110), b = c(111:120) )
C = data.frame( a = c(5:8), b = c(55:58) )
L = list(list(B,C),list(A),list(C,A),list(A,B,C),list(C))
unlist(L,recursive=F)
```