[R Graphics] Beautiful graphics thread

TheEcologist

Global Moderator
#1
Hi Everyone,

I thought I might start a thread which showcases the graphics abilities of R. I'll start it off with one I just made.

This thread was also inspired by bg and trinker claiming that this could never be done in R.
Of course it can you just won't be able 1) to showcase it in R but it works nice with any non IE browser or 2) do it in ggplot. This is SVG, and its enormously powerful!

(fyi Combining R with SVG and XML can put you in the position to make amazing presentations <- that is sozi, its good, but you can also use jezzyink)

To make graphs like the above and the below, you need to get down and dirty with all the functions at the base of ggplot and beyond (head over to the "darkside").

So I'll start off with this example, I was inspired to improve on it after I saw someone's attempt to do it in ggplot.

Hopefully this hits off and we can share some amazing graphics and code with each other. My next goal will be do show how you can do this in R.

So here is my attempt at making a more realistic "Earth at Night in R, using only base graphics" ;



For some reason it looks much better in R, so just copy paste the following;

Code:
# Earth at night in R
# took about 30 minutes to code

require(maps)
data(world.cities)
#subset minor and major cities
minmajdat=world.cities[world.cities$pop>10000,]

#subset major cities
majdat=world.cities[world.cities$pop>100000,]

#subset huge cities
hdat=world.cities[world.cities$pop>1000000,]

ranker=rank(world.cities$pop)

par(bg=rgb(0,0,0.025),mex=0.5,mar=c(0,0,0,0))

map("world",col=rgb(0,0,0.55,
alpha=0.05),interior=F,fill=T)

#plot huge cities
points(hdat$lat[ranker]~hdat$long[ranker],col=rgb(0.25,0.25,0,alpha=0.05)
,pch='.',cex=(hdat$pop)^c(1/100))

# illumate the night

	for(i in 1:5){
points(jitter(world.cities$lat[ranker],1)~jitter(world.cities$long[ranker],1),
col=rgb(230/255,232/255, 250/255,alpha=0.0075),pch='.',cex=(world.cities$pop)^c(1/100))
	}

#create light haze arround minor & major cities
	for(i in 1:5){
icol=rgb(0.7,0.7,0.7,alpha=0.02)
points(jitter(minmajdat$lat[ranker],10)~
jitter(minmajdat$long[ranker],10),
col=icol,pch='.',cex=(minmajdat$pop)^c(1/100))
	}

#create yellow light haze arround major cities
	for(i in 1:6){
icol=rgb(1,1,0,alpha=0.02)
points(jitter(majdat$lat[ranker],10)~
jitter(majdat$long[ranker],10),
col=icol,pch='.',cex=(majdat$pop)^c(1/100))
	}

#create bright light haze arround huge cities
	for(i in 1:6){
icol=rgb(1,1,1,alpha=0.02)
points(jitter(hdat$lat[ranker],10)~
jitter(hdat$long[ranker],10),
col=icol,pch='.',cex=(hdat$pop)^c(1/100))
	}
Looking forward to seeing your contributions.

TE
 

Dason

Ambassador to the humans
#2
I just have one thing to add

THE ENTIRE WORLD HAS GONE DARK! THE SUN IS DEAD! WE SURELY WILL NOT SURVIVE.
 

vinux

Dark Knight
#3
Here is one on TS chat data. There is nothing new in graphical side. Here is the chat summary (I have removed the chat contents for confidential reason) Dataset. It is an intraday analysis of chat data. I have taken IST as the reference. I thought of adding one more scale. But I don't wanted to confuse with day light saving system. Some of the US folks they just need to change am to pm and pm to am.




Here you can find the code.

Code:
## Reading the chat data set
ts <- read.csv(insert_url_for_dropbox_csv_here) # Send a message if you need to know what the file is.
ts$dt <- strptime(ts$Date, "%m/%d %H:%M") 
ts$dttime <- as.POSIXlt(ts$dt, "IST") 

## grouping the users not the top ten best chatters
nTop10bUsers <- ts$User.Name %in% c("bugman", "CowboyBear",  "Dragan",
                                    "hlsmith",  "Lazar", "ledzep", "Link", "noetsi",
                                    "quark", "SiBorg", "SmoothJohn" )
ts$Chatter <- ts$User.Name
ts$Chatter[nTop10bUsers] <- "Others"


chat.colour <- c("bryangoodrich"="limegreen", "Dason"="red", "GretaGarbo"="green", "Jake"="brown", "jimmy brooks"="gold", "Others"="grey", "spunky"="cyan", "TheEcologist"="darkred", "trinker"="indianred3", "victorxstc"="yellow", "vinux"="blue")



nf <- layout(matrix(c(2, 5, 1, 3, 4, 0), 3, 2, byrow=TRUE), c(9, 3), c(3, 9, 2), TRUE)

par(mar=c(1, 4, 1, 1))
cdplot( as.factor(ts$Chatter) ~ ts$dttime$hour, col=chat.colour, xlab="Time", ylab="Users")
## axis(side=1,at=1:24, labels=c("Midnight", paste(1:11), "Noon",paste(1:11)),srt=90)

par(mar=c(0, 4, 0, 0))
chat.t <- barplot( tapply(ts$Chatter, ts$dttime$hour, length), col=gray(c(seq(0.1, 1, length=12),seq(1, 0.1, length=12))))
text(y=100, x=chat.t, label=c("Midnight", paste(1:11, "am"), "Noon",paste(1:11, "pm")), srt=90)

par(mar=c(0, 0, 0, 0))
chat.s <- table(ts$Chatter)
chat.sbar <- barplot(chat.s, col=chat.colour, yaxt="n", horiz=TRUE)
text(y=chat.sbar, x=chat.s*.5, label=names(chat.s))

par(mar=c(4, 4, 1, 1))
x <- seq(-pi, pi, len=24)
y=0
r <- outer(x, y, "+")
image(z = z <- 1-sin(r/2)^2,x,y, col=gray((0:32)/32),xlab="Time (Intraday)",xaxs="i",xaxt="n",yaxt="n",ylab="")

par(mar=rep(0, 4), cex=0.5)
plot.new()
plot.window(c(0,1), 0:1)
par(family="sans", lheight=4.5)
text(.35, .75, "TalkStats",font=4,cex=1.5)
text(.35, .55, "Intraday Analysis",font=2) 
text(.35, .35, "Top 10 best Chatters",cex=.85)
I tried to automate the color from the chat. I will try this next time. Now some findings. You can identify the sleeping/working time. What I can observe is all humans sleep at least 2-3 hours. Rest is left to you.

EDIT: Changed Chron functions
 

TheEcologist

Global Moderator
#4
Can we now conclude that Dason doesn't sleep? (and who also doesn't sleep children?)

@vinux, does your code need a require(chron)? - Oh and great graphic!
 

vinux

Dark Knight
#5
Can we now conclude that Dason doesn't sleep? (and who also doesn't sleep children?)

@vinux, does your code need a require(chron)?
Unfortunately yes. I thought of removing that.

My stupidity. I could have used ts$dttime$hour.


EDIT: I have removed chron function now. It is purely a base R example.
 

TheEcologist

Global Moderator
#6
Oke, as discussed in the chatbox.. I slight improvement to the (already incredible) design using a visually more stimulating palette (I strongly recommend using palette's in all your colour graphs - it is more appealing to the human brain - below find my homemade mineral palette)

Code:
#Mineral palette - HEX CODES FROM:http://rapid-tools.net/online-color-picker/
mineral.colors=colorRampPalette(c("#d67646","#943d2c","#474243","#e6bc78",
"#decfba")) 

#APPLIED TO VINUX's great graphic

## Reading the chat data set
ts <- read.csv(insert_url_for_dropbox_csv_here) # Send a message if you need to know what the file is.
ts$dt <- strptime(ts$Date, "%m/%d %H:%M") 
ts$dttime <- as.POSIXlt(ts$dt, "IST") 

## grouping the users not the top ten best chatters
nTop10bUsers <- ts$User.Name %in% c("bugman", "CowboyBear",  "Dragan",
                                    "hlsmith",  "Lazar", "ledzep", "Link", "noetsi",
                                    "quark", "SiBorg", "SmoothJohn" )
ts$Chatter <- ts$User.Name
ts$Chatter[nTop10bUsers] <- "Others"


chat.colour=mineral.colors(length(unique(ts$Chatter)))
names(chat.colour)=unique(ts$Chatter)



nf <- layout(matrix(c(2, 5, 1, 3, 4, 0), 3, 2, byrow=TRUE), c(9, 3), c(3, 9, 2), TRUE)

par(mar=c(1, 4, 1, 1))
cdplot( as.factor(ts$Chatter) ~ ts$dttime$hour, col=chat.colour, xlab="Time", ylab="Users")
## axis(side=1,at=1:24, labels=c("Midnight", paste(1:11), "Noon",paste(1:11)),srt=90)

par(mar=c(0, 4, 0, 0))
chat.t <- barplot( tapply(ts$Chatter, ts$dttime$hour, length), col=gray(c(seq(0.1, 1, length=12),seq(1, 0.1, length=12))))
text(y=100, x=chat.t, label=c("Midnight", paste(1:11, "am"), "Noon",paste(1:11, "pm")), srt=90)

par(mar=c(0, 0, 0, 0))
chat.s <- table(ts$Chatter)
chat.sbar <- barplot(chat.s, col=chat.colour, yaxt="n", horiz=TRUE)
text(y=chat.sbar, x=chat.s*.5, label=names(chat.s))

par(mar=c(4, 4, 1, 1))
x <- seq(-pi, pi, len=24)
y=0
r <- outer(x, y, "+")
image(z = z <- 1-sin(r/2)^2,x,y, col=gray((0:32)/32),xlab="Time (Intraday)",xaxs="i",xaxt="n",yaxt="n",ylab="")

par(mar=rep(0, 4), cex=0.5)
plot.new()
plot.window(c(0,1), 0:1)
par(family="sans", lheight=4.5)
text(.35, .75, "TalkStats",font=4,cex=1.5)
text(.35, .55, "Intraday Analysis",font=2) 
text(.35, .35, "Top 10 best Chatters",cex=.85)

It now looks like this:


What do you think Vinux?
 

hlsmith

Not a robit
#7
It would be interesting to incorporate or consider the inter-relationship between chatters. Obviously some individuals co-mingle more and the chat is more lively at certain times. Is the time based on the international date line and medians over a week?
 

TheEcologist

Global Moderator
#8
Oke, because it is autumn. I thought I would add my autumn R palette :)

Code:
# Color Palette

#Autumn palette
indiansummer.colors=colorRampPalette(c("gold","#FFB11F", "#FC4D04","#755E02", "#191E01","#B0050A","red")) 

#APPLIED TO VINUX's great graphic

## Reading the chat data set
ts <- read.csv(insert_url_for_dropbox_csv_here) # Send a message if you need to know what the file is.
ts$dt <- strptime(ts$Date, "%m/%d %H:%M") 
ts$dttime <- as.POSIXlt(ts$dt, "IST") 

## grouping the users not the top ten best chatters
nTop10bUsers <- ts$User.Name %in% c("bugman", "CowboyBear",  "Dragan",
                                    "hlsmith",  "Lazar", "ledzep", "Link", "noetsi",
                                    "quark", "SiBorg", "SmoothJohn" )
ts$Chatter <- ts$User.Name
ts$Chatter[nTop10bUsers] <- "Others"


chat.colour=indiansummer.colors(length(unique(ts$Chatter)))
names(chat.colour)=unique(ts$Chatter)



nf <- layout(matrix(c(2, 5, 1, 3, 4, 0), 3, 2, byrow=TRUE), c(9, 3), c(3, 9, 2), TRUE)

par(mar=c(1, 4, 1, 1))
cdplot( as.factor(ts$Chatter) ~ ts$dttime$hour, col=chat.colour, xlab="Time", ylab="Users")
## axis(side=1,at=1:24, labels=c("Midnight", paste(1:11), "Noon",paste(1:11)),srt=90)

par(mar=c(0, 4, 0, 0))
chat.t <- barplot( tapply(ts$Chatter, ts$dttime$hour, length), col=gray(c(seq(0.1, 1, length=12),seq(1, 0.1, length=12))))
text(y=100, x=chat.t, label=c("Midnight", paste(1:11, "am"), "Noon",paste(1:11, "pm")), srt=90)

par(mar=c(0, 0, 0, 0))
chat.s <- table(ts$Chatter)
chat.sbar <- barplot(chat.s, col=chat.colour, yaxt="n", horiz=TRUE)
text(y=chat.sbar, x=chat.s*.5, label=names(chat.s))

par(mar=c(4, 4, 1, 1))
x <- seq(-pi, pi, len=24)
y=0
r <- outer(x, y, "+")
image(z = z <- 1-sin(r/2)^2,x,y, col=gray((0:32)/32),xlab="Time (Intraday)",xaxs="i",xaxt="n",yaxt="n",ylab="")

par(mar=rep(0, 4), cex=0.5)
plot.new()
plot.window(c(0,1), 0:1)
par(family="sans", lheight=4.5)
text(.35, .75, "TalkStats",font=4,cex=1.5)
text(.35, .55, "Intraday Analysis",font=2) 
text(.35, .35, "Top 10 best Chatters",cex=.85)
So now with Indian summer colours -->



Again, thanks for the wonderful graphic vinux!
 

vinux

Dark Knight
#9
It would be interesting to incorporate or consider the inter-relationship between chatters. Obviously some individuals co-mingle more and the chat is more lively at certain times. Is the time based on the international date line and medians over a week?
Here is first level of inter-relationship between chatters. You can ignore the diagonal part.


Code:
## Reading the chat data set
ts <- read.csv(insert_url_for_dropbox_csv_here) # Send a message if you need to know what the file is.

## Not in the top ten best chatters
nTop10bUsers <- ts$User.Name %in% c("bugman", "CowboyBear",  "Dragan",
                                    "hlsmith",  "Lazar", "ledzep", "Link", "noetsi",
                                    "quark", "SiBorg", "SmoothJohn" )

ts$Chatter <- ts$User.Name
ts$Chatter[nTop10bUsers] <- "Others"

## chat.table <- table(ts$Chatter[1:nrow(ts)-1], ts$Chatter[2:nrow(ts)])
chat.table <- table(ts$Chatter[1:(nrow(ts)-1)], ts$Chatter[2:nrow(ts)])
## This may not make sense
assocplot(chat.table)
chat.matrix <- as.matrix(chat.table)
diag(chat.matrix) <- 0
par(mar=c(6, 6, 1, 1), las=2)
assocplot(chat.matrix, space=0.1, main="Chat (t) vs Chat (t+1)")
Black shows the positive association. You can see the clusters. Victor-greta, trinker-Dason, VInux-TE, jake-Dason. Dason is positively associated with most of the users.

One could also try mosaicplot.

EDIT: Incorporated Dason's correction
 

trinker

ggplot2orBust
#10
To some extend this reveals where we live. People in the states have similar shapes where as people on the other side of the world have the opposite shapes. Lovin' this thread.
 

trinker

ggplot2orBust
#11
gotta tear this association plot apart. I'm wondering if the association is based on proximity or mention of someone's name (ie @vinux) but haven't run the code.
 
#12
You also have to remember that victorxstc sometimes thinks outloud and will post dozens of posts without really communicating with anyone in particular! Perhaps someone in the social sciences or psych can better make the appropriate consideration for this anomaly :)
 

Dason

Ambassador to the humans
#13
It doesn't change anything since using an index of 0 essentially is ignored but...
Code:
#in this line:
chat.table <- table(ts$Chatter[1:nrow(ts)-1], ts$Chatter[2:nrow(ts)])
# we have
ts$Chatter[1:nrow(ts)-1]
# but it should be
ts$Chatter[1:(nrow(ts)-1)]
# but this is even better
ts$Chatter[seq(nrow(ts)-1)]
 
#14
Can we sticky this thread?

Also, from the chatbox statistics, I think we can say that Greta and Victor are very likely to be starting up a relationship (p<0.0001).
 

trinker

ggplot2orBust
#17
I wanted to point out the talk stats library I bundled up using Dason and bryangoodrich's collective genius that downloads the chat box.

To download package:
Code:
# install.packages("devtools")
library(devtools)
install_github("talkstats", "trinker")
To download chats (I think it tries to get your user name password and store it the first time you use it but you can manually add to arguments each time):
Code:
library(talkstats)
x <- ts_chatbox(splitDate = FALSE)
 

trinker

ggplot2orBust
#18
We were looking at association of chatters. I'm working on an average distance measure function in R that goes with the visual representation as a gantt plot as seen below.

The code is pretty simple but relies on the qdap package that you'd have to download from my github. Also you'll need the talkstats package from there as well:

Getting qdap and talkstats
Code:
# install.packages("devtools")
library(devtools)
install_github("talkstats", "trinker")
nstall_github("qdap", "trinker")
Making gantt plots
Code:
library(qdap); library(talkstats)

dat <- ts_chatbox()

[COLOR="gray"]#plot 1 colored[/COLOR]
x <- with(dat, gantt_plot(dialogue, person))

[COLOR="gray"]#plot 2 black[/COLOR]
x + scale_color_manual(values=rep("black", length(levels(dat$person))))
#or
with(dat, gantt_plot(dialogue, person, bar.color="black"))

[COLOR="gray"]#plot 3 faceted[/COLOR]
with(dat, gantt_plot(dialogue, person, date, space = "free"))
To get a pdf of the graphics together -click here-

Plot 1

Plot2

Plot2
Too big to display. -click here- instead.

The level of detail in the pdf is well worth it. png tends to lose some smaller time durations.

The next step for this is to finish working on the distance function by first properly using the math annotation to describe what's going on and also use with outer and Vectorize to produce a distance matrix of average distances between users. If anyone wants to help here's that thread (LINK).
 

trinker

ggplot2orBust
#20
My apologies I thought they were self explanatory probably because I've been working with them so much lately. I was hoping they were because that's the mark of a good plot. Think of the y axis as time (unit of measure though is words). Where you have a color strip you were conversing in the chat box.

In the unfaceted plots we have time (days) as one big continuum from left to right. In the faceted I broke up the days. I could have gotten fancier with plotting the background colors by day but was lazy. To some extent then we can assume that people clustered in close proximity to each other were more conversant with on another. This distance measure I'm working on may capture this even better.

EDIT: I just realized that the xlab is set to a funky default. I changed that behavior but don't feel like fixing the graphics (lazy).