+ Reply to Thread
Page 9 of 10 FirstFirst 1 2 3 4 5 6 7 8 9 10 LastLast
Results 121 to 135 of 138

Thread: An R learning project (feel free to learn with me)

  1. #121
    Probably A Mammal
    Points: 18,660, Level: 86
    Level completed: 62%, Points required for next Level: 190
    bryangoodrich's Avatar
    Location
    Sacramento, California, United States
    Posts
    2,194
    Thanks
    291
    Thanked 491 Times in 446 Posts

    Re: An R learning project (feel free to learn with me)




    Check this out

    Code: 
    library(sp)
    library(maps)
    df <- map_data('county', 'new york')  # Grab the New York region county data
    centroids <- by(df, df$region, function(county) Polygon(county[c('long', 'lat')])@labpt)  # Returns a county-named list of label points
    centroids <- do.call("rbind", centroids)  # Convert into Data Frame
    names(centroids) <- c('long', 'lat')
    Now, what I did was supply the Polygon function the coordinates that define a polygon (notice that the first and last row match--it's closed). It creates an S4 Polygon class object that has a slot for "label point" (labpt). I only tested out a much more complicated process (like, 3 more nested steps) on the first county, but I got the centroid to be the same as the label point. Thus, the provided label points seem to be the centroids, from what I can guess. I provide this as a way for you to check what your averaging the polygon corners produces to what the actual centroids are (the actual computation is a little more involved). I tried to use tapply instead of by, but this is one of those times when they don't match up. Using tapply returns an error "arguments must have the same length." To me, by always seems to work as if I did s/lapply(split(df, df$factor), someFun). So, screw tapply! I also tried to supply Polygon to an aggregate call (so it works on each county), but that didn't work out at all.

    Now, what did I have to do to get the centroids? Well, it's as easy as just saying coordinates and supplying it a SpatialPolygons object. Easy, right? No! What is a SpatialPolygons object? It's a list of Polygons. What is a Polygons object? It's a list of Polygon! So I had to take Polygon(df[df$subregion == "albany", c(1,2)]) and make it a list. But wait, there's more! Polygons apparently needs labels, so I put that list of one Polygon into 'y' and ran Polygons(y, "shutup!") and got my Polygons object (a list of S4 Polygon with a labeled name slot). Now, we can't just toss this new Polygons object into SpatialPolygons. It requires an integer vector to specify the number of Polygons in the list or some ****, as well as a coordinate reference system. Thus, we have to do

    Code: 
    z <- ... Polygons go here ...
    SpatialPolygons(z, as.integer(1), CRS("+proj=longlat"))  # Yes, "1" is not an integer!! ugh
    So like I said, it may only be a few more steps, but it is a pain! I hope it's just due to my ignorance of sp and it's spatial S4 objects, but the return result of running coordinates on the above SpatialPolygons object was the same coordinates as Polygon(df[df$subregion=="albany", c(1:2)])@labpt. I was not going to bother with seeing if it held consistent with the other centroids calculated.

  2. #122
    ggplot2orBust
    Points: 34,836, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    User with most referrers
    trinker's Avatar
    Location
    Buffalo, NY
    Posts
    3,934
    Thanks
    1,348
    Thanked 745 Times in 668 Posts

    Re: An R learning project (feel free to learn with me)

    Curses I didn't see you responded until I came back here to update. I made hand manipulations to labels that looked wrong (see here).

    Anyway I tried the code and get the following error:
    Code: 
    > library(maps)
    > df <- map_data('county', 'new york')  # Grab the New York region county data
    > centroids <- by(df, df$region, function(county) Polygon(county[c('long', 'lat')])@labpt)  # Returns a county-named list of label points
    Error in Polygon(county[c("long", "lat")]) : ring not closed
    > centroids <- do.call("rbind", centroids)  # Convert into Data Frame
    Error in do.call("rbind", centroids) : object 'centroids' not found
    Obviously I get the second error after the first error because of the first error. What am I messing up?
    "If you torture the data long enough it will eventually confess."
    -Ronald Harry Coase -

  3. #123
    Probably A Mammal
    Points: 18,660, Level: 86
    Level completed: 62%, Points required for next Level: 190
    bryangoodrich's Avatar
    Location
    Sacramento, California, United States
    Posts
    2,194
    Thanks
    291
    Thanked 491 Times in 446 Posts

    Re: An R learning project (feel free to learn with me)

    Didn't realize map_data was a ggplot2 function. Anyway, one of the polygons or something doesn't end where it starts is my guess. I don't recall having that problem last night, but I didn't go that route I simplified earlier. I messed up. The data frame should be split on subregion. I think I renamed it or something. Region is specifying "new york" instead of each county! I updated the code below.

    Oh, and I updated rbind to be rbind.data.frame

    Code: 
    library(ggplot2)  # I'll change to getting the data from maps only: map_data is just a wrapper
    library(sp)
    library(maps)
    df <- map_data('county', 'new york')  # Grab the New York region county data
    centroids <- by(df, df$subregion, function(county) Polygon(county[c('long', 'lat')])@labpt)  # Returns a county-named list of label points
    centroids <- do.call("rbind.data.frame", centroids)  # Convert into Data Frame
    names(centroids) <- c('long', 'lat')
    
    map('county', 'new york')
    text(centroids$long, centroids$lat, rownames(centroids), offset=0, cex=0.4)
    Took me 0.6 seconds to get the New York county label points (hopefully centroids). The fact still remains, even in a GIS you'd have to go in and adjust the labels for cases that don't work (too small to hold label, e.g.). Usually this means going from mere system-defined labels (attached to the entities--counties--themselves) to annotation. In other words, work on another layer of information on the map itself. This is not usually designed well by automation or code! It's visual, which is why this sort of "post-processing" is done after the map is designed or, in the case of ArcGIS, in its relatively okay editor.
    Last edited by bryangoodrich; 02-25-2012 at 09:58 PM. Reason: added a basic mapping operation

  4. #124
    ggplot2orBust
    Points: 34,836, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    User with most referrers
    trinker's Avatar
    Location
    Buffalo, NY
    Posts
    3,934
    Thanks
    1,348
    Thanked 745 Times in 668 Posts

    Re: An R learning project (feel free to learn with me)

    With the help of the people over at SO I was able to devise a locator for ggplot maps. It still isn't fully satisfying as you must supply :

    Code: 
    + scale_x_continuous(expand=c(0,0)) +  scale_y_continuous(expand=c(0,0))
    To the end of the object. It works though and should aid in label placement. I also inquired at the ggplot help list for a better way and will update if there is.

    Code: 
    require(maps); require(ggplot2); require(grid)
    
    require(maps); library(ggplot2); require(grid)
    county_df <- map_data('county')  #mappings of counties by state
    ny <- subset(county_df, region=="new york")   #subset just for NYS
    ny$county <- ny$subregion
    
    
    NY1 <- ggplot(ny, aes(long, lat)) +  
              geom_polygon(aes(group=group), colour='black', fill=NA) +
              coord_map() + geom_point(aes(c(-78, -73), c(41, 40.855), 
              colour=c("blue", "red"))) + opts(legend.position = "none") 
    
    NY <- NY1 + scale_x_continuous(expand=c(0,0)) + 
              scale_y_continuous(expand=c(0,0))
              #the scale x and y have to be added to the plot
    
    NY 
    
    ggmap.loc <- function(object){  #the locator function
        x <- grid.ls()[[1]][grep("panel-", grid.ls()[[1]])] #locate the panel
        seekViewport(x)
        y <-  as.numeric(grid.locator("npc"))
        locatedX <- min(object$data$long) + y[1]*diff(range(object$data$long))
        locatedy <- min(object$data$lat) + y[2]*diff(range(object$data$lat))
        return(c(locatedX, locatedy))
    } #end of function
    
    ggmap.loc(NY)
    "If you torture the data long enough it will eventually confess."
    -Ronald Harry Coase -

  5. #125
    Probably A Mammal
    Points: 18,660, Level: 86
    Level completed: 62%, Points required for next Level: 190
    bryangoodrich's Avatar
    Location
    Sacramento, California, United States
    Posts
    2,194
    Thanks
    291
    Thanked 491 Times in 446 Posts

    Re: An R learning project (feel free to learn with me)

    Why are you still subsetting your map data?! I already showed you: map_data('county', 'new york') does it implicitly!

  6. The Following User Says Thank You to bryangoodrich For This Useful Post:

    trinker (02-27-2012)

  7. #126
    ggplot2orBust
    Points: 34,836, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    User with most referrers
    trinker's Avatar
    Location
    Buffalo, NY
    Posts
    3,934
    Thanks
    1,348
    Thanked 745 Times in 668 Posts

    Re: An R learning project (feel free to learn with me)

    I know I know haven't actually changed it in the script yet. I fixed it now.

    See I did it here: http://stackoverflow.com/questions/9...460981#9460981
    "If you torture the data long enough it will eventually confess."
    -Ronald Harry Coase -

  8. #127
    Probably A Mammal
    Points: 18,660, Level: 86
    Level completed: 62%, Points required for next Level: 190
    bryangoodrich's Avatar
    Location
    Sacramento, California, United States
    Posts
    2,194
    Thanks
    291
    Thanked 491 Times in 446 Posts

    Re: An R learning project (feel free to learn with me)

    Quote Originally Posted by trinker View Post
    From SO
    ny$county <- ny$subregion
    ny <- map_data('county', 'new york')
    Shouldn't that be reversed? You haven't defined ny yet! and why are you just duplicating a field with a different name? Just use the provided name or change it's name!

    Edit: I still don't see what you're trying to do with this lol You going to point-and-click where you want to alter labels? Better, I'd think, to find a way to get the dimensions of the polygon or something such that you can tell if a given label applied to it will fit. If not, turn it sideways. Does it fit along those dimensions. If it doesn't, don't print it at all (not printing is common in maps when there's overlap or not enough room; though, you should try to supply the label post-process by some further means as I've mentioned before).

  9. #128
    ggplot2orBust
    Points: 34,836, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    User with most referrers
    trinker's Avatar
    Location
    Buffalo, NY
    Posts
    3,934
    Thanks
    1,348
    Thanked 745 Times in 668 Posts

    Re: An R learning project (feel free to learn with me)

    There is no need for that. I got rid of that but left it in the code. originally it was in there because I thought I had to match another dataframe. I am always paranoid of messing with data and so I created another column (not at all necessary). I edited it out.


    Quote Originally Posted by bryangoodrich
    You going to point-and-click where you want to alter labels? Better, I'd think, to find a way to get the dimensions of the polygon or something such that you can tell if a given label applied to it will fit. If not, turn it sideways. Does it fit along those dimensions. If it doesn't, don't print it at all (not printing is common in maps when there's overlap or not enough room; though, you should try to supply the label post-process by some further means as I've mentioned before).
    What you've given me is already pretty darn good. Only a few need to be tweaked and I'm way to under skilled/motivated to easily do this so I think for me the approach I'm taking (use your centering method with a few post shifts and twists) makes the most sense.
    "If you torture the data long enough it will eventually confess."
    -Ronald Harry Coase -

  10. #129
    Probably A Mammal
    Points: 18,660, Level: 86
    Level completed: 62%, Points required for next Level: 190
    bryangoodrich's Avatar
    Location
    Sacramento, California, United States
    Posts
    2,194
    Thanks
    291
    Thanked 491 Times in 446 Posts

    Re: An R learning project (feel free to learn with me)

    Don't worry. I'm not even skilled enough (yet) to do it! I'd just avoid the auto-labeling all together or think of them as another layer and make a list of those problem cases to remove from being printed. Or just make a factor that specifies which ones to keep, which ones to rotate, and which to remove. Then apply the layer twice (for each of the two printing categories).

  11. #130
    ggplot2orBust
    Points: 34,836, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    User with most referrers
    trinker's Avatar
    Location
    Buffalo, NY
    Posts
    3,934
    Thanks
    1,348
    Thanked 745 Times in 668 Posts

    Re: An R learning project (feel free to learn with me)

    UPDATE It's mid semester and so this project has been stalled for a bit, but I have been dabbling here and there on it. I've been playing a bit with the sp package after bryan discussed it and I say it in a Paul Murrell book. It has some nice capabilities beyond the map package. Also released are the new versions of ggplot and ggmap that have some nice functions to make mapping(spatial visualization) easier.

    The geocoding function in the ggmap package makes interfacing with the Google very nice (particularly in conjunction with plyr as seen below). I haven't tested this on a larger data set such as the one for this project but will eventually. I fear that in trying to grab too many geocodes there won't be enough wait time in between and so some sort of call to system sleep may be needed. Also nice is the addition of the gglocator function as a parallel to the locator function in base that works for grid.

    Code: 
    library(ggmap)
    
    geocode("University at Buffalo")
    geocode('1600 Pennsylvania Avenue, Washington DC')
    
    x <- list('1600 Pennsylvania Avenue, Washington DC',
        '338 Millard Fillmore Academic Complex  Buffalo, NY 14261')
    
    library(plyr)
    ldply(x, geocode)
    The ggplot2 package .9 release also for flexibility with legends as never seen before.

    i haven't gotten a chance to run through bryangoodrich's code and see what's happening. I'm excited to try but that will have to wait until spring break.
    "If you torture the data long enough it will eventually confess."
    -Ronald Harry Coase -

  12. #131
    ggplot2orBust
    Points: 34,836, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    User with most referrers
    trinker's Avatar
    Location
    Buffalo, NY
    Posts
    3,934
    Thanks
    1,348
    Thanked 745 Times in 668 Posts

    Re: An R learning project (feel free to learn with me)

    Alright I'm feeling revived to work on this project a bit more. I've learned a few things about mapping since I last posted and have seen some pretty cool potential things I'd like to try. I'm at the point where I faceted the data by subject ELA/Math looking at passing rates on NYS exams (a Choloropleth). I'm at a sticking point and asking for some assistance getting back on track.

    First lets look at what I've got so far:


    I'm to the part where I'm adding the county names to the map and have went back to this stack overflow post I did where Bryan gave a nice method for centering but I can't seem to get them to plot because of R claims it can't find a group. Bryan's centroids data is in blue, the text plotting (which throws up the error) is in red.

    The load function will load in an .RData file from my drop box that contains the 2 data frames I'm plotting.

    Code: 
    load(url("http://dl.dropbox.com/u/61803503/Names/cholo.RData"))
    
    library(ggplot2)  # For map_data. It's just a wrapper; should just use maps.
    library(sp)
    library(maps)
    library(RColorBrewer)
    library(scales)
    library(mapproj)
    
    getLabelPoint <- # Returns a county-named list of label points
    function(county) {Polygon(county[c('long', 'lat')])@labpt}
    
    df <- map_data('county', 'new york')                 # NY region county data
    centroids <- by(df, df$subregion, getLabelPoint)     # Returns list
    centroids <- do.call("rbind.data.frame", centroids)  # Convert to Data Frame
    names(centroids) <- c('long', 'lat')                 # Appropriate Header
    centroids$subregion <- rownames(centroids)
    centroids$angle <- rep(0, nrow(centroids))
    
    
    ggplot(map.data2, aes(long, lat, group=group)) +  #plot pass rates math
       geom_polygon(aes(fill=level), colour=alpha('white', 1/2), size=0.2) +
       geom_polygon(data=ny, colour='black', fill=NA) + 
       geom_text(data=centroids, aes(x=long, y=lat, 
           labels=rownames(centroids), angle=angle), size=3) +
       scale_fill_brewer(palette='RdYlBu', guide = guide_legend(title = "Percent Passing"))+
       facet_grid(.~Subject)+
       opts(title = "
           New York State Counties Passing Rate \non Elementary ELA Assessments") +
       opts(axis.text.x = theme_blank(), axis.text.y = theme_blank(), 
           axis.ticks = theme_blank())+
       opts(legend.background = theme_rect()) +
       scale_x_continuous('') + scale_y_continuous('') + 
       labs(title = "legend title") + theme_bw()+
       opts(axis.line=theme_blank(),axis.text.x=theme_blank(),
            axis.text.y=theme_blank(),axis.ticks=theme_blank(),
            axis.title.x=theme_blank(), legend.position="bottom",
            axis.title.y=theme_blank(),
            panel.background=theme_blank(),panel.grid.major=theme_blank(),
            panel.grid.minor=theme_blank(),plot.background=theme_blank())
    It throws up this error:
    Code: 
    Error in eval(expr, envir, enclos) : object 'group' not found
    EDIT: Haven't got a hit here so I'm going to cross post this at SO and link back (LINK).
    "If you torture the data long enough it will eventually confess."
    -Ronald Harry Coase -

  13. #132
    ggplot2orBust
    Points: 34,836, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    User with most referrers
    trinker's Avatar
    Location
    Buffalo, NY
    Posts
    3,934
    Thanks
    1,348
    Thanked 745 Times in 668 Posts

    Re: An R learning project (feel free to learn with me)

    Alright thanks to a poster at SO I found out I had to set group in geom_text to NULL. Not sure why but it works. This is what it looks like now (pdf is better quality so names show up better):



    This is where the svg stuff that's interactive would be nice, where you can hover over a county and see it's name; I'd also include the percent, total students, poverty level, % minority, % title 1 schools etc. If I did this for a publication and wanted names I wouldn't facet it but would opt for 2 larger images.

    Next is playing with the new geom_map in ggplot2. It's pretty cool because you don't have to bin the continuous data to categories (ranges), the scale is a gradient. Very nice. Plus you don't have to stack the data for faceting to long format.
    "If you torture the data long enough it will eventually confess."
    -Ronald Harry Coase -

  14. #133
    Probably A Mammal
    Points: 18,660, Level: 86
    Level completed: 62%, Points required for next Level: 190
    bryangoodrich's Avatar
    Location
    Sacramento, California, United States
    Posts
    2,194
    Thanks
    291
    Thanked 491 Times in 446 Posts

    Re: An R learning project (feel free to learn with me)

    The continuous scale is known as an unclassed (i.e., without classes) choropleth. It has the advantage of using a continuous gradient of values but the downside that users can still only distinguish so many. Using classes has that advantage that the user can more simply interpret the map, but then you lose that extra precision of a continuous scale. Tradeoffs!

    The choice of including labels really depends on its reproduction. If you're serving something up to a user to interact with, then a GIS, web app, or SVG has that benefit, but if you're displaying a PDF or static map of some sort, you have to make those choices of what scale to show, etc. An alternative in this case is to do an image book where you break the map down into tiles that are each at larger scale ('closer'). It makes it easier to see each tile with more clarity, but then you have to look at each tile separately. In this case, it would be apt to include a full map without the detailed labels and such to give the overview, then each tile is a closer window at those details.

  15. #134
    ggplot2orBust
    Points: 34,836, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    User with most referrers
    trinker's Avatar
    Location
    Buffalo, NY
    Posts
    3,934
    Thanks
    1,348
    Thanked 745 Times in 668 Posts

    Re: An R learning project (feel free to learn with me)

    Feels good to get back to this Anyway I tested out Hadley's new geom_map. It treats the data as continuous rather than binned data and and as Bryan has already mentioned has advantages and drawbacks. I'm all about being armed with which ever tool fits the job. It's really not that much different than binning except one elss step, you still have to stack the data, I was mistaken. Hadely uses reshape2 so I did the same but you could use base reshape instead or what ever. Anyway I thought I'd present the map and the code (PS found out you can use load with drop box but not readRDS to my knowledage).



    Here's a link to a nicer pdf version (LINK)

    Code: 
    save(centroids, ny, map.data2, file=paste0(db, "Names/cholo.RData"))
    load(url("http://dl.dropbox.com/u/61803503/Names/cholo.RData"))
    
    library(ggplot2)  
    library(reshape2)
    library(RColorBrewer)
    library(scales)
    
    colrs <- c("#330000", "#FFFF00") #give a low and high color to scale_fill_continuous
    
    map.data3 <- map.data2[, -c(8:14, 16, 20:21) ]
    map.data3.melt <- melt(map.data3, id = c(1:7, 10:11))
    
    levels(map.data3.melt$variable) <- c("Math", "ELA")
    head(map.data3.melt)
    
    ggplot(map.data3.melt, aes(map_id = subregion)) + 
        geom_map(aes(fill = value), map = ny) + 
        expand_limits(x = ny$long, y = ny$lat) + 
        facet_wrap( ~ variable) +
        geom_polygon(data=ny, aes(x=long, y=lat), colour='black', fill=NA) +
        scale_fill_continuous(low=colrs[1], high=colrs[2], 
            guide = guide_colorbar(title = "Percent Passing", values=colrs,
            barwidth = 10, barheight = .5)) +
        opts(axis.text.x = theme_blank(), axis.text.y = theme_blank(), 
            axis.ticks = theme_blank())+
        opts(legend.background = theme_rect()) +
        scale_x_continuous('') + scale_y_continuous('') + 
        labs(title = "legend title") + 
        theme_bw()+
        opts(axis.line=theme_blank(),axis.text.x=theme_blank(),
            axis.text.y=theme_blank(),axis.ticks=theme_blank(),
            axis.title.x=theme_blank(), legend.position="bottom",
            axis.title.y=theme_blank(),
            panel.background=theme_blank(),panel.grid.major=theme_blank(),
            panel.grid.minor=theme_blank(),plot.background=theme_blank(), 
            panel.margin = unit(0, "lines"), strip.text.x = theme_text(size=12, face="bold")) +
        opts(title = "New York State Counties Passing Rate on Grades 3-8 Assessments") +
        geom_text(data=centroids, aes(x=long, y=lat, label=subregion, angle=angle, group=NULL), size=3.25)
    "If you torture the data long enough it will eventually confess."
    -Ronald Harry Coase -

  16. #135
    ggplot2orBust
    Points: 34,836, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    User with most referrers
    trinker's Avatar
    Location
    Buffalo, NY
    Posts
    3,934
    Thanks
    1,348
    Thanked 745 Times in 668 Posts

    Re: An R learning project (feel free to learn with me)


    NOTE to future users: We discussed the USC WebGIS Services on here for geocoding but I recieved this email today:

    We are pleased to announce that the USC WebGIS Services have been successfully migrated to their new home at Texas A&M University.

    The new URL for accessing the services is http://geoservices.tamu.edu.

    All user accounts have been migrated as have the transaction balances associated with all user accounts.

    All API users should update their API calls to use the new URL. All query parameters remain exactly the same; the only change that needs to be made is to switch http://webgis.usc.edu to http://geoservices.tamu.edu.

    Within the next few days, the IP associated with the USC domain http://webgis.usc.edu will automatically begin forwarding to http://geoservices.tamu.edu. At that time, all queries to webgis.usc.edu will automatically be forwarded to geoservices.tamu.edu.

    As always, thank you for your interest in our services and let us know if you have any problems or need any help.

    Regards,
    The Texas A&M Geo Team
    "If you torture the data long enough it will eventually confess."
    -Ronald Harry Coase -

  17. The Following User Says Thank You to trinker For This Useful Post:

    bryangoodrich (10-19-2012)

+ Reply to Thread
Page 9 of 10 FirstFirst 1 2 3 4 5 6 7 8 9 10 LastLast

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats