I realized today I seem to learn the most when I have some kind of project that stretches my abilities beyond where they are currently. Like many of you I'm an avid R user and like you I have several things in R I've been meaning to learn and have put off. Well today I came across a Wickham plot (LINK) that inspired me to undertake a project that will require me to stretch and grow.
The goal(s) is to learn to get data from several websites and use google maps and the ( maps package) (wanted to learn how to do it with both but the google maps one looked pretty professional) and R to plot information from the data set. Specifically I have decided to use schools data from New York State (my home state) and plot the location of the schools and perhaps some different plotting symbols to represent difference in some demographics information.
Here's my plan of attack:
If you have any ideas or suggests now or along the way please share. This is just a general plan of attack. As I work through the problem I'll update you and ask for help when I need it.
- I'm going to need your help
- I think this could be a great learning experience for everyone.
- It helps keep me accountable to accomplish this
- It'll keep everything organized and tidy
The goal(s) is to learn to get data from several websites and use google maps and the ( maps package) (wanted to learn how to do it with both but the google maps one looked pretty professional) and R to plot information from the data set. Specifically I have decided to use schools data from New York State (my home state) and plot the location of the schools and perhaps some different plotting symbols to represent difference in some demographics information.
Here's my plan of attack:
- Write a function to scrape (I think this is scraping but may not be called that) the school names and their addresses to a nice data frame (LINK). This will require looping through each county and extracting the information. Bryangoodrich shared a similar data retrieval loop a few months back (LINK).
- Convert the address to lat and longitude. I'm thinking of using the following method:
Code:coord <- function(address){ require(XML) url = paste('http://maps.google.com/maps/api/geocode/xml?address=', address,'&sensor=false',sep='') doc = xmlTreeParse(url) root = xmlRoot(doc) lat = xmlValue(root[['result']][['geometry']][['location']][['lat']]) long = xmlValue(root[['result']][['geometry']][['location']][['lng']]) return(c(lat , long)) } addr3 = c('1600 Pennsylvania Avenue, Washington, DC' , 'W Ball Rd & S Disneyland Dr, Anaheim, CA') sapply(addr3, coord)
- Read in some other interesting data files such as test info or demographics from the net (The csv should be straight forward but one file is in Microsoft access format so I'd have to figure out how that's done; probably I'll stick with the .csv file). Data found here (LINK) and here (LINK).
- Use ggplot to plot the data (addresses turned lat and long) on a simple New York state map from the maps package as shown here (LINK). This will require converting the long and lat to a grid using ( mapproj)
- Repeat # 4 but interfacing with google maps as seen here (LINK).
- On the maps (from steps 4 and 5) combine the address plots with some interesting demographic data just to add some flare to the map using
If you have any ideas or suggests now or along the way please share. This is just a general plan of attack. As I work through the problem I'll update you and ask for help when I need it.