how to plot a scatterplot with three different y axes?

gianmarco

TS Contributor
#1
Hello,

I have a dataset like the following:
Code:
mydata <- structure(list(historical_sources = c(7.5, 6, 10, 2.5, 4.5, 2, 
4.5, 2.5, 1, 7, 4.5, 9, 2.5, 2.5, 1.5, 3, 4, 3.5, 6.5, 7, 12.5, 
15.5), May = c(5.1, 14.3, 16.9, 2.5, 30.8, 2.4, 7.7, 1.7, 1.7, 
5.3, 2.5, 7.8, 4.9, 3.1, 2.9, 1.9, 1.9, 6.9, 4.7, 11.5, 8.5, 
10.8), July = c(4.3, 22.6, 22.9, 5.6, 22.3, 4.4, 10.5, 1.9, 1.9, 
4.8, 3.4, 8, 8, 3.9, 8.6, 1.4, 1.4, 3.2, 8.2, 14.6, 9.5, 16.1
), October = c(8.7, 20.4, 15.9, 4.6, 8.9, 6.4, 10, 4.5, 4.5, 
9.1, 7.4, 14.8, 2.1, 6.5, 1.9, 1.8, 1.8, 4.1, 8.7, 12.3, 4.9, 
4.6), May_diff = c(-2.4, 8.3, 6.9, 0, 26.3, 0.4, 3.2, -0.8, 0.7, 
-1.7, -2, -1.2, 2.4, 0.6, 1.4, -1.1, -2.1, 3.4, -1.8, 4.5, -4, 
-4.7), July_diff = c(-3.2, 16.6, 12.9, 3.1, 17.8, 2.4, 6, -0.6, 
0.9, -2.2, -1.1, -1, 5.5, 1.4, 7.1, -1.6, -2.6, -0.3, 1.7, 7.6, 
-3, 0.6), October_diff = c(1.2, 14.4, 5.9, 2.1, 4.4, 4.4, 5.5, 
2, 3.5, 2.1, 2.9, 5.8, -0.4, 4, 0.4, -1.2, -2.2, 0.6, 2.2, 5.3, 
-7.6, -10.9)), .Names = c("historical_sources", "May", "July", 
"October", "May_diff", "July_diff", "October_diff"), class = "data.frame", row.names = c(NA, 
-22L))
I would like to plot a scatterplot in which x is the "historical sources" variable, while three variables (i.e., May_diff, July_diff, and October_diff) should be displayied on three different y axes.

I cannot wrap my head around how to get that (either in R base or ggplot2).

Any elucidation is (as usual) highly appreciated.

Best
Gm
 

bryangoodrich

Probably A Mammal
#5
I don't think 3D plots are useful in 99% of cases, and this would be one of them. Here is what I would do

Code:
library(ggplot2)
x <- reshape2::melt(mydata[c("historical_sources", "May_diff", "July_diff", "October_diff")], id.vars = "historical_sources")
ggplot(x) + aes(historical_sources, value) + geom_point() + facet_wrap(~variable)
Basically, reshape your data from wide-to-long format where each row tuple is (historical_source, variable, value) pairs, and variable takes on the label of *_diff. Then we simply plot the (historical_source, value) pairs and facet into 3 separate plots according to the variable labels. You can keep them on the shared value-axis or let them scale freely (facet_wrap parameter scale = "free_y")

Additionally, you can remove the faceting and color according to variable in aes (color = variable).

It really depends on what relationship you're trying to capture in this visualization. The per historical source deviation? The distribution (maybe boxplots)? Are the historical sources supposed to be categorical? (There are a lot of single instances per source here).

Visualizing isn't hard. It's building the right visual for the intended purpose/application that is hard. You should clarify that purpose a bit more!