+ Reply to Thread
Results 1 to 9 of 9

Thread: Producing time series of averages

  1. #1
    Points: 40, Level: 1
    Level completed: 80%, Points required for next Level: 10

    Posts
    7
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Producing time series of averages



    Hello, I am new to R so probably a basic question...

    I am working with climate data that spans 1000 years in length in monthly increments. The data is a variable, say temperature, on a spatial grid with 144 longitudes and 90 latitudes (in other words, it spans the globe with 2.5 x 2 degree resolution). The dimensions are thus 144 x 90 x 12000, and for example, saying temp[10,19,345] would output the temperature at the 10th longitude grid box, 19th latitude, and 345th month.

    Right now I am working on a program that produces a time-series of average temperatures over a prescribed region. That is, I want to define a "box" (say from the 10th to the 30th longitude, and from the 6th to the 8th latitude) and produce a vector with one column (of the average temperature) and 12,000 rows.

    Right now, I have done the spatial averaging shown below (please assume I have already loaded in the data and declared names to the temperature and area variables used):

    Also note that "area" variable is needed because all the grid boxes do not have the same surface area, and so I have to weight each box accordingly. The surface area for a given box changes with latitude but is independent of its longitude. The area of a given box is given as area[longitude,latitude].

    ######inputs######
    # declare time range of interest
    timecall1= 1
    timecall2=12000

    lat1=6 # latitude range to do average
    lat2=7
    lon1=5
    lon2=6 # longitude range to do average
    ##################

    k =lat1; sum=0;
    while (k != lat2+1) {
    sum=sum+sum(area[lon1,k]*temp[lon1:lon2,k,timecall1])
    k= k+1;
    }

    avg_value=sum/(sum(area[lon1:lon2,lat1:lat2]))
    Basically what I'm doing is averaging all the longitude values of temperature for a particular latitude, then moving up to the next latitude and repeating the process, etc.

    I've checked the spatial averaging and it seems to be fine, though I come from a science and not a programming background, so if there is a more efficient way to do the above please let me know. I can imagine that with a data set this big such a calculation might get computationally expensive.

    The last step is to make it a time series. The above program only works for a single month. It seems trivial but I can't get it. I've tried changing the value in the parentheses to "timecall1:timecall2" or adding in another loop, but it just ends up adding all the values in a way that I don't want.

    Any input appreciated. Thanks!
    Last edited by cc31621; 07-29-2012 at 04:50 PM.

  2. #2
    Probably A Mammal
    Points: 14,462, Level: 78
    Level completed: 3%, Points required for next Level: 388
    bryangoodrich's Avatar
    Location
    Sacramento, California, United States
    Posts
    1,951
    Thanks
    221
    Thanked 419 Times in 387 Posts

    Re: Producing time series of averages

    My initial thought is to store this in R as a 3D array. It's like a matrix object, but it allows higher dimensions. In this case, you'll have spatial panels with another dimension of time. Each panel is a month. For prototyping, consider this

    Code: 
    a <- array(dim = c(10, 2, 3))  # 10 rows by 2 cols by 3 panels
    a[, , 1] <- matrix(sample(seq(100), 20))  # 1st panel
    a[, , 2] <- matrix(sample(seq(100), 20))  # 2st panel
    a[, , 3] <- matrix(sample(seq(100), 20))  # 3st panel
    You'll get different results since I didn't set the see (set.seed(...)), but here's what I get

    Code: 
    , , 1
    
          [,1] [,2]
     [1,]   52   19
     [2,]   75   50
     [3,]   47   66
     [4,]   62    9
     [5,]   74   32
     [6,]   16   71
     [7,]   76   97
     [8,]   36   31
     [9,]   57   94
    [10,]   41   82
    
    , , 2
    
          [,1] [,2]
     [1,]   61   32
     [2,]   56   53
     [3,]   76   25
     [4,]   58   29
     [5,]   71   95
     [6,]   91   99
     [7,]   35   62
     [8,]   28   20
     [9,]   57   47
    [10,]    8   98
    
    , , 3
    
          [,1] [,2]
     [1,]   48   78
     [2,]   41    2
     [3,]   27   17
     [4,]   34   91
     [5,]   90   16
     [6,]   96   58
     [7,]   59   39
     [8,]   26   23
     [9,]   93   31
    [10,]   97   20
    So what you're asking for is to take a subset of a panel and calculate its average for all 12,000 panels. The result can be stored simply as a 12,000 point vector in R, each position corresponding to its respective panel month. How would we simulate that in our prototype here?

    Well, if I want to only look at rows 4 through 6, it's as easy as

    Code: 
    a[4:6, , ]
    I can of course include such subsets for the columns and the panel dimension. Both the syntax and concept here correspond to what you wanted. Of course, if we think of the rows in my array here as latitude and the columns as longitude, I don't have much to demonstrate! Still, suppose this was your data set and you wanted to look at the box you described. It would be as simple as

    Code: 
    a[10:30, 6:8, ]
    This would give us the box on each month. The question now is how do I take the average on each of the panels? Using my example above, we simply use apply on it with the MARGIN parameter on the panels

    Code: 
    apply(a[4:6, , ], 3, mean)
    This gives us

    Code: 
    [1] 44.00000 73.83333 64.16667
    If you took the 6 values on each of the panels of a[4:6, , ], you'll find that it matches those 3 vector point values. If you did this with the subset indexes for the 2 dimensions, you should end up with a vector 12,000 points long of the means you wanted.

    The question you need to ask yourself now is how do you populate the array with the data you have. That, however, requires a bit more knowledge about how your data is stored and pulled into R. This is some pretty large data, but not the sort that require particularly special care. To save time, however, you will want to use R effectively. So, if you give us more details about your data files, we could prescribe a way to scan them into R to store them in a 3D array.

  3. #3
    Points: 40, Level: 1
    Level completed: 80%, Points required for next Level: 10

    Posts
    7
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Producing time series of averages

    Thanks bryan,

    I can test this and provide further details tomorrow when I am at work.

    For now, I can elaborate by saying I am using netCDF files and using "ncdf" in order to manipulate the data within the R environment.

    For example, I say temp = get.var.ncdf(nc,"tsurf") for the "tsurf" variable contained within nc. nc=open.ncdf(file name.nc). I'm not sure if you are familiar with ncdf or are looking for something more fundamental, but I was not involved with the construction of the netCDF files. I'm only bringing them into R to do some data analysis, but I think you can treat them as any arbitrary 144 x 90 x 12000 matrix....

  4. #4
    Probably A Mammal
    Points: 14,462, Level: 78
    Level completed: 3%, Points required for next Level: 388
    bryangoodrich's Avatar
    Location
    Sacramento, California, United States
    Posts
    1,951
    Thanks
    221
    Thanked 419 Times in 387 Posts

    Re: Producing time series of averages

    I used to work next to an atmospheric scientist, so I know what netCDF is, but I don't know how that library pulls it into R. If you can take a snippet of the data and print the output from dput on that data object, we can dget to load it up into R ourselves. You can also show the print out of str(...) that should detail the structure of the object.

  5. #5
    Points: 40, Level: 1
    Level completed: 80%, Points required for next Level: 10

    Posts
    7
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Producing time series of averages

    Here is some output. Not sure if this is what you are looking for (note this is 1200 not 12,000 months in length, since I have 10 files split up into centuries)...if not, let me know what you need me to do

  6. #6
    Points: 40, Level: 1
    Level completed: 80%, Points required for next Level: 10

    Posts
    7
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Producing time series of averages

    dput(nc)
    structure(list(id = 4L, ndims = 3L, natts = 5L, unlimdimid = 3,
    filename = "tsurf_MON_1050-1149.E4rhLMgTck.nc", varid2Rindex = c(0,
    0, 1, 2), writable = FALSE, dim = structure(list(lon = structure(list(
    name = "lon", len = 144L, unlim = FALSE, id = 1L, dimvarid = 1,
    units = "degrees_east", vals = structure(c(-178.75, -176.25,
    -173.75, -171.25, -168.75, -166.25, -163.75, -161.25,
    -158.75, -156.25, -153.75, -151.25, -148.75, -146.25,
    -143.75, -141.25, -138.75, -136.25, -133.75, -131.25,
    -128.75, -126.25, -123.75, -121.25, -118.75, -116.25,
    -113.75, -111.25, -108.75, -106.25, -103.75, -101.25,
    -98.75, -96.25, -93.75, -91.25, -88.75, -86.25, -83.75,
    -81.25, -78.75, -76.25, -73.75, -71.25, -68.75, -66.25,
    -63.75, -61.25, -58.75, -56.25, -53.75, -51.25, -48.75,
    -46.25, -43.75, -41.25, -38.75, -36.25, -33.75, -31.25,
    -28.75, -26.25, -23.75, -21.25, -18.75, -16.25, -13.75,
    -11.25, -8.75, -6.25, -3.75, -1.25, 1.25, 3.75, 6.25,
    8.75, 11.25, 13.75, 16.25, 18.75, 21.25, 23.75, 26.25,
    28.75, 31.25, 33.75, 36.25, 38.75, 41.25, 43.75, 46.25,
    48.75, 51.25, 53.75, 56.25, 58.75, 61.25, 63.75, 66.25,
    68.75, 71.25, 73.75, 76.25, 78.75, 81.25, 83.75, 86.25,
    88.75, 91.25, 93.75, 96.25, 98.75, 101.25, 103.75, 106.25,
    108.75, 111.25, 113.75, 116.25, 118.75, 121.25, 123.75,
    126.25, 128.75, 131.25, 133.75, 136.25, 138.75, 141.25,
    143.75, 146.25, 148.75, 151.25, 153.75, 156.25, 158.75,
    161.25, 163.75, 166.25, 168.75, 171.25, 173.75, 176.25,
    178.75), .Dim = 144L), create_dimvar = TRUE), .Names = c("name",
    "len", "unlim", "id", "dimvarid", "units", "vals", "create_dimvar"
    ), class = "dim.ncdf"), lat = structure(list(name = "lat",
    len = 90L, unlim = FALSE, id = 2L, dimvarid = 2, units = "degrees_north",
    vals = structure(c(-90, -87, -85, -83, -81, -79, -77,
    -75, -73, -71, -69, -67, -65, -63, -61, -59, -57, -55,
    -53, -51, -49, -47, -45, -43, -41, -39, -37, -35, -33,
    -31, -29, -27, -25, -23, -21, -19, -17, -15, -13, -11,
    -9, -7, -5, -3, -1, 1, 3, 5, 7, 9, 11, 13, 15, 17, 19,
    21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47,
    49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75,
    77, 79, 81, 83, 85, 87, 90), .Dim = 90L), create_dimvar = TRUE), .Names = c("name",
    "len", "unlim", "id", "dimvarid", "units", "vals", "create_dimvar"
    ), class = "dim.ncdf"), time = structure(list(name = "time",
    len = 1200L, unlim = TRUE, id = 3L, dimvarid = -1L, vals = 1:1200,
    units = "", create_dimvar = FALSE), .Names = c("name",
    "len", "unlim", "id", "dimvarid", "vals", "units", "create_dimvar"
    ), class = "dim.ncdf")), .Names = c("lon", "lat", "time")),
    nvars = 2, var = structure(list(axyp = structure(list(id = 3L,
    name = "axyp", ndims = 2L, natts = 2L, size = c(144L,
    90L), prec = "float", dimids = c(1, 2), units = "m^2",
    longname = "gridcell area", dims = list(), dim = list(
    structure(list(name = "lon", len = 144L, unlim = FALSE,
    id = 1L, dimvarid = 1, units = "degrees_east",
    vals = structure(c(-178.75, -176.25, -173.75,
    -171.25, -168.75, -166.25, -163.75, -161.25,
    -158.75, -156.25, -153.75, -151.25, -148.75,
    -146.25, -143.75, -141.25, -138.75, -136.25,
    -133.75, -131.25, -128.75, -126.25, -123.75,
    -121.25, -118.75, -116.25, -113.75, -111.25,
    -108.75, -106.25, -103.75, -101.25, -98.75, -96.25,
    -93.75, -91.25, -88.75, -86.25, -83.75, -81.25,
    -78.75, -76.25, -73.75, -71.25, -68.75, -66.25,
    -63.75, -61.25, -58.75, -56.25, -53.75, -51.25,
    -48.75, -46.25, -43.75, -41.25, -38.75, -36.25,
    -33.75, -31.25, -28.75, -26.25, -23.75, -21.25,
    -18.75, -16.25, -13.75, -11.25, -8.75, -6.25,
    -3.75, -1.25, 1.25, 3.75, 6.25, 8.75, 11.25,
    13.75, 16.25, 18.75, 21.25, 23.75, 26.25, 28.75,
    31.25, 33.75, 36.25, 38.75, 41.25, 43.75, 46.25,
    48.75, 51.25, 53.75, 56.25, 58.75, 61.25, 63.75,
    66.25, 68.75, 71.25, 73.75, 76.25, 78.75, 81.25,
    83.75, 86.25, 88.75, 91.25, 93.75, 96.25, 98.75,
    101.25, 103.75, 106.25, 108.75, 111.25, 113.75,
    116.25, 118.75, 121.25, 123.75, 126.25, 128.75,
    131.25, 133.75, 136.25, 138.75, 141.25, 143.75,
    146.25, 148.75, 151.25, 153.75, 156.25, 158.75,
    161.25, 163.75, 166.25, 168.75, 171.25, 173.75,
    176.25, 178.75), .Dim = 144L), create_dimvar = TRUE), .Names = c("name",
    "len", "unlim", "id", "dimvarid", "units", "vals",
    "create_dimvar"), class = "dim.ncdf"), structure(list(
    name = "lat", len = 90L, unlim = FALSE, id = 2L,
    dimvarid = 2, units = "degrees_north", vals = structure(c(-90,
    -87, -85, -83, -81, -79, -77, -75, -73, -71,
    -69, -67, -65, -63, -61, -59, -57, -55, -53,
    -51, -49, -47, -45, -43, -41, -39, -37, -35,
    -33, -31, -29, -27, -25, -23, -21, -19, -17,
    -15, -13, -11, -9, -7, -5, -3, -1, 1, 3, 5, 7,
    9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31,
    33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55,
    57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79,
    81, 83, 85, 87, 90), .Dim = 90L), create_dimvar = TRUE), .Names = c("name",
    "len", "unlim", "id", "dimvarid", "units", "vals",
    "create_dimvar"), class = "dim.ncdf")), varsize = c(144L,
    90L), unlim = FALSE, missval = 1e+30, hasAddOffset = FALSE,
    hasScaleFact = FALSE), .Names = c("id", "name", "ndims",
    "natts", "size", "prec", "dimids", "units", "longname", "dims",
    "dim", "varsize", "unlim", "missval", "hasAddOffset", "hasScaleFact"
    ), class = "var.ncdf"), tsurf = structure(list(id = 4L, name = "tsurf",
    ndims = 3L, natts = 2L, size = c(144L, 90L, 1200L), prec = "float",
    dimids = c(1, 2, 3), units = "C", longname = "SURFACE AIR TEMPERATURE",
    dims = list(), dim = list(structure(list(name = "lon",
    len = 144L, unlim = FALSE, id = 1L, dimvarid = 1,
    units = "degrees_east", vals = structure(c(-178.75,
    -176.25, -173.75, -171.25, -168.75, -166.25, -163.75,
    -161.25, -158.75, -156.25, -153.75, -151.25, -148.75,
    -146.25, -143.75, -141.25, -138.75, -136.25, -133.75,
    -131.25, -128.75, -126.25, -123.75, -121.25, -118.75,
    -116.25, -113.75, -111.25, -108.75, -106.25, -103.75,
    -101.25, -98.75, -96.25, -93.75, -91.25, -88.75,
    -86.25, -83.75, -81.25, -78.75, -76.25, -73.75, -71.25,
    -68.75, -66.25, -63.75, -61.25, -58.75, -56.25, -53.75,
    -51.25, -48.75, -46.25, -43.75, -41.25, -38.75, -36.25,
    -33.75, -31.25, -28.75, -26.25, -23.75, -21.25, -18.75,
    -16.25, -13.75, -11.25, -8.75, -6.25, -3.75, -1.25,
    1.25, 3.75, 6.25, 8.75, 11.25, 13.75, 16.25, 18.75,
    21.25, 23.75, 26.25, 28.75, 31.25, 33.75, 36.25,
    38.75, 41.25, 43.75, 46.25, 48.75, 51.25, 53.75,
    56.25, 58.75, 61.25, 63.75, 66.25, 68.75, 71.25,
    73.75, 76.25, 78.75, 81.25, 83.75, 86.25, 88.75,
    91.25, 93.75, 96.25, 98.75, 101.25, 103.75, 106.25,
    108.75, 111.25, 113.75, 116.25, 118.75, 121.25, 123.75,
    126.25, 128.75, 131.25, 133.75, 136.25, 138.75, 141.25,
    143.75, 146.25, 148.75, 151.25, 153.75, 156.25, 158.75,
    161.25, 163.75, 166.25, 168.75, 171.25, 173.75, 176.25,
    178.75), .Dim = 144L), create_dimvar = TRUE), .Names = c("name",
    "len", "unlim", "id", "dimvarid", "units", "vals", "create_dimvar"
    ), class = "dim.ncdf"), structure(list(name = "lat",
    len = 90L, unlim = FALSE, id = 2L, dimvarid = 2,
    units = "degrees_north", vals = structure(c(-90,
    -87, -85, -83, -81, -79, -77, -75, -73, -71, -69,
    -67, -65, -63, -61, -59, -57, -55, -53, -51, -49,
    -47, -45, -43, -41, -39, -37, -35, -33, -31, -29,
    -27, -25, -23, -21, -19, -17, -15, -13, -11, -9,
    -7, -5, -3, -1, 1, 3, 5, 7, 9, 11, 13, 15, 17, 19,
    21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45,
    47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71,
    73, 75, 77, 79, 81, 83, 85, 87, 90), .Dim = 90L),
    create_dimvar = TRUE), .Names = c("name", "len",
    "unlim", "id", "dimvarid", "units", "vals", "create_dimvar"
    ), class = "dim.ncdf"), structure(list(name = "time",
    len = 1200L, unlim = TRUE, id = 3L, dimvarid = -1L,
    vals = 1:1200, units = "", create_dimvar = FALSE), .Names = c("name",
    "len", "unlim", "id", "dimvarid", "vals", "units", "create_dimvar"
    ), class = "dim.ncdf")), varsize = c(144L, 90L, 1200L
    ), unlim = TRUE, missval = 1e+30, hasAddOffset = FALSE,
    hasScaleFact = FALSE), .Names = c("id", "name", "ndims",
    "natts", "size", "prec", "dimids", "units", "longname", "dims",
    "dim", "varsize", "unlim", "missval", "hasAddOffset", "hasScaleFact"
    ), class = "var.ncdf")), .Names = c("axyp", "tsurf"))), .Names = c("id",
    "ndims", "natts", "unlimdimid", "filename", "varid2Rindex", "writable",
    "dim", "nvars", "var"), class = "ncdf")

  7. #7
    Points: 40, Level: 1
    Level completed: 80%, Points required for next Level: 10

    Posts
    7
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Producing time series of averages

    And I'd still like to follow up on the original method I had started off with, even if it's not suitable for a large dataset...just for my own education.

  8. #8
    Probably A Mammal
    Points: 14,462, Level: 78
    Level completed: 3%, Points required for next Level: 388
    bryangoodrich's Avatar
    Location
    Sacramento, California, United States
    Posts
    1,951
    Thanks
    221
    Thanked 419 Times in 387 Posts

    Re: Producing time series of averages

    Okay. Don't ever do your original method. Done. You don't iterate over vectors in R. It's slow as hell. You use vectorized operations. Just compare the speed of these two commands

    Code: 
    x <- rnorm(10000000)
    sum(x)  # Instant! Vectorized functiion
    
    y <- 0
    for(i in 1:length(x)) y <- y + x[i]  # SLOOOOW! Don't use loops

  9. #9
    Probably A Mammal
    Points: 14,462, Level: 78
    Level completed: 3%, Points required for next Level: 388
    bryangoodrich's Avatar
    Location
    Sacramento, California, United States
    Posts
    1,951
    Thanks
    221
    Thanked 419 Times in 387 Posts

    Re: Producing time series of averages


    That's an ugly data set, but the values are accessible. It looks to me like you'll want to first extract the values and form your matrix. I'd focus on one decade file at a time. I'll have to look at this data more deeply to figure out exactly how it links together. You might consider looking at some of the spatial packages (e.g., sp), as they may have methods for handling this sort of data better.

+ Reply to Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts








Advertise on Talk Stats