An R learning project (feel free to learn with me)

bryangoodrich

Probably A Mammal
Check this out

Code:
library(sp)
library(maps)
df <- map_data('county', 'new york')  # Grab the New York region county data
centroids <- by(df, df$region, function(county) Polygon(county[c('long', 'lat')])@labpt)  # Returns a county-named list of label points
centroids <- do.call("rbind", centroids)  # Convert into Data Frame
names(centroids) <- c('long', 'lat')
Now, what I did was supply the Polygon function the coordinates that define a polygon (notice that the first and last row match--it's closed). It creates an S4 Polygon class object that has a slot for "label point" (labpt). I only tested out a much more complicated process (like, 3 more nested steps) on the first county, but I got the centroid to be the same as the label point. Thus, the provided label points seem to be the centroids, from what I can guess. I provide this as a way for you to check what your averaging the polygon corners produces to what the actual centroids are (the actual computation is a little more involved). I tried to use tapply instead of by, but this is one of those times when they don't match up. Using tapply returns an error "arguments must have the same length." To me, by always seems to work as if I did s/lapply(split(df, df$factor), someFun). So, screw tapply! I also tried to supply Polygon to an aggregate call (so it works on each county), but that didn't work out at all.

Now, what did I have to do to get the centroids? Well, it's as easy as just saying coordinates and supplying it a SpatialPolygons object. Easy, right? No! What is a SpatialPolygons object? It's a list of Polygons. What is a Polygons object? It's a list of Polygon! So I had to take Polygon(df[df$subregion == "albany", c(1,2)]) and make it a list. But wait, there's more! Polygons apparently needs labels, so I put that list of one Polygon into 'y' and ran Polygons(y, "shutup!") and got my Polygons object (a list of S4 Polygon with a labeled name slot). Now, we can't just toss this new Polygons object into SpatialPolygons. It requires an integer vector to specify the number of Polygons in the list or some ****, as well as a coordinate reference system. Thus, we have to do

Code:
z <- ... Polygons go here ...
SpatialPolygons(z, as.integer(1), CRS("+proj=longlat"))  # Yes, "1" is not an integer!! ugh
So like I said, it may only be a few more steps, but it is a pain! I hope it's just due to my ignorance of sp and it's spatial S4 objects, but the return result of running coordinates on the above SpatialPolygons object was the same coordinates as Polygon(df[df$subregion=="albany", c(1:2)])@labpt. I was not going to bother with seeing if it held consistent with the other centroids calculated.
 

trinker

ggplot2orBust
Curses I didn't see you responded until I came back here to update. I made hand manipulations to labels that looked wrong (see here).

Anyway I tried the code and get the following error:
Code:
> library(maps)
> df <- map_data('county', 'new york')  # Grab the New York region county data
> centroids <- by(df, df$region, function(county) Polygon(county[c('long', 'lat')])@labpt)  # Returns a county-named list of label points
[COLOR="red"]Error in Polygon(county[c("long", "lat")]) : ring not closed[/COLOR]
> centroids <- do.call("rbind", centroids)  # Convert into Data Frame
[COLOR="red"]Error in do.call("rbind", centroids) : object 'centroids' not found[/COLOR]
Obviously I get the second error after the first error because of the first error. What am I messing up?
 

bryangoodrich

Probably A Mammal
Didn't realize map_data was a ggplot2 function. Anyway, one of the polygons or something doesn't end where it starts is my guess. I don't recall having that problem last night, but I didn't go that route I simplified earlier. I messed up. The data frame should be split on subregion. I think I renamed it or something. Region is specifying "new york" instead of each county! I updated the code below.

Oh, and I updated rbind to be rbind.data.frame

Code:
library(ggplot2)  # I'll change to getting the data from maps only: map_data is just a wrapper
library(sp)
library(maps)
df <- map_data('county', 'new york')  # Grab the New York region county data
centroids <- by(df, df$subregion, function(county) Polygon(county[c('long', 'lat')])@labpt)  # Returns a county-named list of label points
centroids <- do.call("rbind.data.frame", centroids)  # Convert into Data Frame
names(centroids) <- c('long', 'lat')

map('county', 'new york')
text(centroids$long, centroids$lat, rownames(centroids), offset=0, cex=0.4)
Took me 0.6 seconds to get the New York county label points (hopefully centroids). The fact still remains, even in a GIS you'd have to go in and adjust the labels for cases that don't work (too small to hold label, e.g.). Usually this means going from mere system-defined labels (attached to the entities--counties--themselves) to annotation. In other words, work on another layer of information on the map itself. This is not usually designed well by automation or code! It's visual, which is why this sort of "post-processing" is done after the map is designed or, in the case of ArcGIS, in its relatively okay editor.
 
Last edited:

trinker

ggplot2orBust
With the help of the people over at SO I was able to devise a locator for ggplot maps. It still isn't fully satisfying as you must supply :

Code:
+ scale_x_continuous(expand=c(0,0)) +  scale_y_continuous(expand=c(0,0))
To the end of the object. It works though and should aid in label placement. I also inquired at the ggplot help list for a better way and will update if there is.

Code:
require(maps); require(ggplot2); require(grid)

require(maps); library(ggplot2); require(grid)
county_df <- map_data('county')  #mappings of counties by state
ny <- subset(county_df, region=="new york")   #subset just for NYS
ny$county <- ny$subregion


NY1 <- ggplot(ny, aes(long, lat)) +  
          geom_polygon(aes(group=group), colour='black', fill=NA) +
          coord_map() + geom_point(aes(c(-78, -73), c(41, 40.855), 
          colour=c("blue", "red"))) + opts(legend.position = "none") 

NY <- NY1 + scale_x_continuous(expand=c(0,0)) + 
          scale_y_continuous(expand=c(0,0))
          #the scale x and y have to be added to the plot

NY 

ggmap.loc <- function(object){  #the locator function
    x <- grid.ls()[[1]][grep("panel-", grid.ls()[[1]])] #locate the panel
    seekViewport(x)
    y <-  as.numeric(grid.locator("npc"))
    locatedX <- min(object$data$long) + y[1]*diff(range(object$data$long))
    locatedy <- min(object$data$lat) + y[2]*diff(range(object$data$lat))
    return(c(locatedX, locatedy))
} #end of function

ggmap.loc(NY)
 

bryangoodrich

Probably A Mammal
From SO
ny$county <- ny$subregion
ny <- map_data('county', 'new york')
Shouldn't that be reversed? You haven't defined ny yet! and why are you just duplicating a field with a different name? Just use the provided name or change it's name!

Edit: I still don't see what you're trying to do with this lol You going to point-and-click where you want to alter labels? Better, I'd think, to find a way to get the dimensions of the polygon or something such that you can tell if a given label applied to it will fit. If not, turn it sideways. Does it fit along those dimensions. If it doesn't, don't print it at all (not printing is common in maps when there's overlap or not enough room; though, you should try to supply the label post-process by some further means as I've mentioned before).
 

trinker

ggplot2orBust
There is no need for that. I got rid of that but left it in the code. originally it was in there because I thought I had to match another dataframe. I am always paranoid of messing with data and so I created another column (not at all necessary). I edited it out.


bryangoodrich said:
You going to point-and-click where you want to alter labels? Better, I'd think, to find a way to get the dimensions of the polygon or something such that you can tell if a given label applied to it will fit. If not, turn it sideways. Does it fit along those dimensions. If it doesn't, don't print it at all (not printing is common in maps when there's overlap or not enough room; though, you should try to supply the label post-process by some further means as I've mentioned before).
What you've given me is already pretty darn good. Only a few need to be tweaked and I'm way to under skilled/motivated to easily do this so I think for me the approach I'm taking (use your centering method with a few post shifts and twists) makes the most sense.
 

bryangoodrich

Probably A Mammal
Don't worry. I'm not even skilled enough (yet) to do it! I'd just avoid the auto-labeling all together or think of them as another layer and make a list of those problem cases to remove from being printed. Or just make a factor that specifies which ones to keep, which ones to rotate, and which to remove. Then apply the layer twice (for each of the two printing categories).
 

trinker

ggplot2orBust
UPDATE It's mid semester and so this project has been stalled for a bit, but I have been dabbling here and there on it. I've been playing a bit with the sp package after bryan discussed it and I say it in a Paul Murrell book. It has some nice capabilities beyond the map package. Also released are the new versions of ggplot and ggmap that have some nice functions to make mapping(spatial visualization) easier.

The geocoding function in the ggmap package makes interfacing with the Google very nice (particularly in conjunction with plyr as seen below). I haven't tested this on a larger data set such as the one for this project but will eventually. I fear that in trying to grab too many geocodes there won't be enough wait time in between and so some sort of call to system sleep may be needed. Also nice is the addition of the gglocator function as a parallel to the locator function in base that works for grid.

Code:
library(ggmap)

geocode("University at Buffalo")
geocode('1600 Pennsylvania Avenue, Washington DC')

x <- list('1600 Pennsylvania Avenue, Washington DC',
    '338 Millard Fillmore Academic Complex  Buffalo, NY 14261')

library(plyr)
ldply(x, geocode)
The ggplot2 package .9 release also for flexibility with legends as never seen before.

i haven't gotten a chance to run through bryangoodrich's code and see what's happening. I'm excited to try but that will have to wait until spring break.
 

trinker

ggplot2orBust
Alright I'm feeling revived to work on this project a bit more. I've learned a few things about mapping since I last posted and have seen some pretty cool potential things I'd like to try. I'm at the point where I faceted the data by subject ELA/Math looking at passing rates on NYS exams (a Choloropleth). I'm at a sticking point and asking for some assistance getting back on track.

First lets look at what I've got so far:


I'm to the part where I'm adding the county names to the map and have went back to this stack overflow post I did where Bryan gave a nice method for centering but I can't seem to get them to plot because of R claims it can't find a group. Bryan's centroids data is in blue, the text plotting (which throws up the error) is in red.

The load function will load in an .RData file from my drop box that contains the 2 data frames I'm plotting.

Code:
load(url("http://dl.dropbox.com/u/61803503/Names/cholo.RData"))

library(ggplot2)  # For map_data. It's just a wrapper; should just use maps.
library(sp)
library(maps)
library(RColorBrewer)
library(scales)
library(mapproj)
[COLOR="blue"][B]
getLabelPoint <- # Returns a county-named list of label points
function(county) {Polygon(county[c('long', 'lat')])@labpt}

df <- map_data('county', 'new york')                 # NY region county data
centroids <- by(df, df$subregion, getLabelPoint)     # Returns list
centroids <- do.call("rbind.data.frame", centroids)  # Convert to Data Frame
names(centroids) <- c('long', 'lat')                 # Appropriate Header
centroids$subregion <- rownames(centroids)
centroids$angle <- rep(0, nrow(centroids))[/B][/COLOR]


ggplot(map.data2, aes(long, lat, group=group)) +  #plot pass rates math
   geom_polygon(aes(fill=level), colour=alpha('white', 1/2), size=0.2) +
   geom_polygon(data=ny, colour='black', fill=NA) + 
   [B][COLOR="red"]geom_text(data=centroids, aes(x=long, y=lat, 
       labels=rownames(centroids), angle=angle), size=3) +[/COLOR][/B]
   scale_fill_brewer(palette='RdYlBu', guide = guide_legend(title = "Percent Passing"))+
   facet_grid(.~Subject)+
   opts(title = "
       New York State Counties Passing Rate \non Elementary ELA Assessments") +
   opts(axis.text.x = theme_blank(), axis.text.y = theme_blank(), 
       axis.ticks = theme_blank())+
   opts(legend.background = theme_rect()) +
   scale_x_continuous('') + scale_y_continuous('') + 
   labs(title = "legend title") + theme_bw()+
   opts(axis.line=theme_blank(),axis.text.x=theme_blank(),
        axis.text.y=theme_blank(),axis.ticks=theme_blank(),
        axis.title.x=theme_blank(), legend.position="bottom",
        axis.title.y=theme_blank(),
        panel.background=theme_blank(),panel.grid.major=theme_blank(),
        panel.grid.minor=theme_blank(),plot.background=theme_blank())
It throws up this error:
Code:
Error in eval(expr, envir, enclos) : object 'group' not found
EDIT: Haven't got a hit here so I'm going to cross post this at SO and link back (LINK).
 

trinker

ggplot2orBust
Alright thanks to a poster at SO I found out I had to set group in geom_text to NULL. Not sure why but it works. This is what it looks like now (pdf is better quality so names show up better):



This is where the svg stuff that's interactive would be nice, where you can hover over a county and see it's name; I'd also include the percent, total students, poverty level, % minority, % title 1 schools etc. If I did this for a publication and wanted names I wouldn't facet it but would opt for 2 larger images.

Next is playing with the new geom_map in ggplot2. It's pretty cool because you don't have to bin the continuous data to categories (ranges), the scale is a gradient. Very nice. Plus you don't have to stack the data for faceting to long format.
 

bryangoodrich

Probably A Mammal
The continuous scale is known as an unclassed (i.e., without classes) choropleth. It has the advantage of using a continuous gradient of values but the downside that users can still only distinguish so many. Using classes has that advantage that the user can more simply interpret the map, but then you lose that extra precision of a continuous scale. Tradeoffs!

The choice of including labels really depends on its reproduction. If you're serving something up to a user to interact with, then a GIS, web app, or SVG has that benefit, but if you're displaying a PDF or static map of some sort, you have to make those choices of what scale to show, etc. An alternative in this case is to do an image book where you break the map down into tiles that are each at larger scale ('closer'). It makes it easier to see each tile with more clarity, but then you have to look at each tile separately. In this case, it would be apt to include a full map without the detailed labels and such to give the overview, then each tile is a closer window at those details.
 

trinker

ggplot2orBust
Feels good to get back to this :) Anyway I tested out Hadley's new geom_map. It treats the data as continuous rather than binned data and and as Bryan has already mentioned has advantages and drawbacks. I'm all about being armed with which ever tool fits the job. It's really not that much different than binning except one elss step, you still have to stack the data, I was mistaken. Hadely uses reshape2 so I did the same but you could use base reshape instead or what ever. Anyway I thought I'd present the map and the code (PS found out you can use load with drop box but not readRDS to my knowledage).



Here's a link to a nicer pdf version (LINK)

Code:
save(centroids, ny, map.data2, file=paste0(db, "Names/cholo.RData"))
load(url("http://dl.dropbox.com/u/61803503/Names/cholo.RData"))

library(ggplot2)  
library(reshape2)
library(RColorBrewer)
library(scales)

colrs <- c("#330000", "#FFFF00") [COLOR="gray"]#give a low and high color to scale_fill_continuous[/COLOR]

map.data3 <- map.data2[, -c(8:14, 16, 20:21) ]
map.data3.melt <- melt(map.data3, id = c(1:7, 10:11))

levels(map.data3.melt$variable) <- c("Math", "ELA")
head(map.data3.melt)

ggplot(map.data3.melt, aes(map_id = subregion)) + 
    geom_map(aes(fill = value), map = ny) + 
    expand_limits(x = ny$long, y = ny$lat) + 
    facet_wrap( ~ variable) +
    geom_polygon(data=ny, aes(x=long, y=lat), colour='black', fill=NA) +
    scale_fill_continuous(low=colrs[1], high=colrs[2], 
        guide = guide_colorbar(title = "Percent Passing", values=colrs,
        barwidth = 10, barheight = .5)) +
    opts(axis.text.x = theme_blank(), axis.text.y = theme_blank(), 
        axis.ticks = theme_blank())+
    opts(legend.background = theme_rect()) +
    scale_x_continuous('') + scale_y_continuous('') + 
    labs(title = "legend title") + 
    theme_bw()+
    opts(axis.line=theme_blank(),axis.text.x=theme_blank(),
        axis.text.y=theme_blank(),axis.ticks=theme_blank(),
        axis.title.x=theme_blank(), legend.position="bottom",
        axis.title.y=theme_blank(),
        panel.background=theme_blank(),panel.grid.major=theme_blank(),
        panel.grid.minor=theme_blank(),plot.background=theme_blank(), 
        panel.margin = unit(0, "lines"), strip.text.x = theme_text(size=12, face="bold")) +
    opts(title = "New York State Counties Passing Rate on Grades 3-8 Assessments") +
    geom_text(data=centroids, aes(x=long, y=lat, label=subregion, angle=angle, group=NULL), size=3.25)
 

trinker

ggplot2orBust
NOTE to future users: We discussed the USC WebGIS Services on here for geocoding but I recieved this email today:

We are pleased to announce that the USC WebGIS Services have been successfully migrated to their new home at Texas A&M University.

The new URL for accessing the services is http://geoservices.tamu.edu.

All user accounts have been migrated as have the transaction balances associated with all user accounts.

All API users should update their API calls to use the new URL. All query parameters remain exactly the same; the only change that needs to be made is to switch http://webgis.usc.edu to http://geoservices.tamu.edu.

Within the next few days, the IP associated with the USC domain http://webgis.usc.edu will automatically begin forwarding to http://geoservices.tamu.edu. At that time, all queries to webgis.usc.edu will automatically be forwarded to geoservices.tamu.edu.

As always, thank you for your interest in our services and let us know if you have any problems or need any help.

Regards,
The Texas A&M Geo Team
 

bryangoodrich

Probably A Mammal
Hey Trinker. I thought I'd revise this dead thread. Maybe we should start a new one? Maybe refine the project to something specific? I wonder where TheEcologist is at because he does all his GIS work in R, if I recall correctly. I'd love to learn some stuff from him. I want to become fluent at doing spatial analyses in R.
 

trinker

ggplot2orBust
Yeah I've come across many ways to stream line this process. I'd like to turn some of the stuff into a package though not for cran more for person use that I can do every year.