Help a novice day. Have you helped a novice today?

bugman

Super Moderator
#1
So anyway,

this plot is awesome.

http://www.datapointed.net/2012/02/san-francisco-rain-year-before-after-valentines-day/

I emailed the author and asked boldly for his script if he had plotted this in R. Turns out he wrote it in Java. I know more vogon than Java.

Anyway, can anyone give me some tips as to how I would approaching a re-production of this?


I was thinking toying around with dotchart or bubble plot, or a combination of both. Not too sure how to get the pyramid plot on the right hand side showing change before and after the pre-specified date.

Anyway, tips, tricks, help anyone?
 
Last edited:

trinker

ggplot2orBust
#2
This is not bad at all in ggplot2. I'd make 2 separate graphs and merge them with gridExtra's grid.arrange.

The first one is simply a Celevlend dotplot: http://stackoverflow.com/a/5545108/1000343 with multiple observations at each y axis group. The aesthetics alpha, size, and fill have been utilized as well to show various things. When size is used to show a third dimension this is often called a bubble plot but doesn't really effect ggplot's underlying grammar.

The second graph is using stacked bars with the same 0 point, this is sometimes called back to back bar plot as seen here: http://www.r-bloggers.com/ggplot2-back-to-back-bar-charts/ In your image they make use of text here as well so you'd want geom text.

To make it pretty you'd want to fool with theme

Last you merge the 2 plots with grid.arrange. Note you may need to align them as well.

PS I liked some of the deign aspects of this chart. Thanks for sharing.

This later part will/may require some work though and may not be worth it if you aren't using it to publish/present (simeply for analysis purposes).
 

bugman

Super Moderator
#4
I have made some progress (for those interested) (Remembering that I am trying to duplicate the plot in the first post).

The first plot (called rainfall) was fairly straight forward using the following data (day is numbered 1-365).

Code:
n.  year     day     Rainfall
1 2010-2011   1       NA
2 2010-2011   2       NA
3 2010-2011   3       NA
4 2010-2011   4       NA
5 2010-2011   5       NA
6 2010-2011   6       NA
Code:
p1<-ggplot(data=rain, aes(x=day, y=Rain.year,size=Rainfall, colour=Rainfall, alpha=0.5)) + geom_point()+xlab("")+ylab("")
p1 + scale_size(range = c(1, 17))+
geom_vline(xintercept=235, colour="dodgerblue4", linetype="solid", alpha=0.2)+
geom_hline(yintercept=1:4,colour="gray46", linetype="solid", alpha=0.6)+
theme(legend.position="none")+
theme(axis.ticks = element_blank(), axis.text.x = element_blank())+
theme(panel.background = element_rect(fill = 'grey95'))+
theme(plot.background = element_rect(fill = 'grey95'))
I created a second data frame to create the second plot (called period) (mainly because my coding is so poor, I couldn't get started automating it from the main data set.


Code:
head(rain4)

   rainyear period label rainfall mean      prop
1 2011-2012 before 30 mm       30 34.2  87.71930
2 2012-2013 before 40 mm       40 34.2  116.95906
3 2013-2014 before 10 mm       10 34.2  29.23977
4 2014-2015 before 12 mm       12 34.2  35.08772
5 2011-2012  after 41 mm       41 34.2 119.88304
6 2012-2013  after 15 mm       15 34.2  43.85965
Code:
p2<-ggplot(rain4, aes(x=rainyear)) + 
  geom_bar(data = subset(rain4, period == "after"),
           aes(y=rainfall, fill=rainfall), stat = "identity", alpha=0.6)+
geom_vline(yintercept=c(1,2,3,4),colour="gray46", linetype="solid", alpha=0.6)

p3<-last_plot() + geom_bar(data = subset(rain4, period == "before"), 
                       aes(y=-rainfall, fill = -rainfall), stat = 'identity',alpha=0.5) + xlab("")+
  ylab("") + 
  coord_flip()+
geom_hline(xintercept=0, colour="dodgerblue4", linetype="solid", alpha=0.6)+
theme(legend.position="none")+
theme(axis.ticks = element_blank(), axis.text.x = element_blank())+
theme(axis.ticks = element_blank(), axis.text.y = element_blank())+
theme(panel.background = element_rect(fill = 'grey95'))+
theme(plot.background = element_rect(fill = 'grey95'))
p3



The second plot looks ok, but there are some problems with it and I would really appreciare a nudge in the right direction here:



1) I can't figure out how to add text (as in before (left hand side) and after (right hand side). I have tried annotate, but i dont think it allows the use of italics.



2) The colour gradient is wrong becasue the scale on the x-axis is wrong. Each horizontal line is a rain year (which matches the above plot) and the bars on the left is total rainfall before a particular date and the bars on the right are data after a particular date. The problem is that the scale (from left to right is -40 to +40) whereas I need this to be positive values on both sides.

The only way around this I have found is to use:

scale_y_reverse() and make two seperate plots, but the issue is that if i do that, I cant figure out how to join them so there is no gap.


And finally, I would like to add the values to each bar.


Any pointers would be great.
 

bugman

Super Moderator
#6
Well I mostly solved the text problem, though I still cant seem to get text into the margin as opposed to the plotting region.

I joined the plots using grid.draw(cbind(ggplotGrob(plot1), ggplotGrob(plot2), size="first"), but the horizontal lines are not aligned. And I still haven't worked out how to fix that x-axis scale issue...
 

bugman

Super Moderator
#7
Edit, just fixed the horizontal lines issue replacing "first" in the grid.draw function with "max"\

Should have done it with base.
 
Last edited:

trinker

ggplot2orBust
#8
Looks terrific, the upper left label, if you use annotate to do it you should be able to go outside the plot region. Fun to watch you get this one.
 

bugman

Super Moderator
#11
Thanks for keeping an eye on this trinker.

Hopefully this will clarify what I mean.

The two side of the plot should both be positive and there for the colour gradient should be 0-40 instead of -40 - +40 which is its current form. Not sure how to do this while keeping the back to back style of the plot.
 

trinker

ggplot2orBust
#12
I think you want something like:

Code:
scale_y_continuous(limits=c(-40, 40),breaks=seq(-40, 40, 20), labels=seq(0, 40, 10))
Here's an example where folks overrode the default labels:

Code:
library(ggplot2) 
library(reshape2) 
library(plyr) 
#sample data 
set.seed(33) 
df<-data.frame(ag=c(1:18),males_year1=sample(100:200,18),females_year1=sample(100:200,18),males_year2=sample(100:200,18),females_year2=sample(100:200,18)) 
#melt the data set 
df<-data.frame(melt(df,id="ag")) 
df 
#here is the plot 
p<-ggplot(df)+ 
  geom_bar(subset=.(df$variable=="males_year1"),stat="identity",aes(x=ag,y=value),fill="#6666FF")+ 
  geom_bar(subset=.(df$variable=="females_year1"),stat="identity",aes(x=ag,y=-value),fill="#FF9333")+ 
  geom_point(subset=.(df$variable=="males_year2"),stat="identity",aes(x=ag,y=value),size=3,colour="#330099")+ 
  geom_point(subset=.(df$variable=="females_year2"),stat="identity",aes(x=ag,y=-value),size=3,colour="#CC3300")+ 
  coord_flip()+ 
  theme_bw()+ 
  scale_y_continuous(limits=c(-200,200),breaks=seq(-200,200,50),labels=abs(seq(-200,200,50)))+ 
  scale_x_continuous(limits=c(0,19),breaks=seq(1,18,1),labels=abs(seq(1,18,1)))+ 
  xlab("age group")+ylab("population")+ 
  theme_bw()+ 
  xlab("age group")+ 
  ylab("population")+ 
  geom_text(y=-100,x=19.2,label="Females")+ 
  geom_text(y=100,x=19.2,label="Males") 

p
PS it's scale_y rather than scale_x because you used coord_flip.
 

bugman

Super Moderator
#14
Got the colour gradient issue fixed using the

scale_fill_gradientn

function.

The text outside the plotting area is still a pain, but I think this is near enough. This is actual rainfall data.

Cheers trinker!