# Thread: Producing time series of averages

1. ## Producing time series of averages

Hello, I am new to R so probably a basic question...

I am working with climate data that spans 1000 years in length in monthly increments. The data is a variable, say temperature, on a spatial grid with 144 longitudes and 90 latitudes (in other words, it spans the globe with 2.5 x 2 degree resolution). The dimensions are thus 144 x 90 x 12000, and for example, saying temp[10,19,345] would output the temperature at the 10th longitude grid box, 19th latitude, and 345th month.

Right now I am working on a program that produces a time-series of average temperatures over a prescribed region. That is, I want to define a "box" (say from the 10th to the 30th longitude, and from the 6th to the 8th latitude) and produce a vector with one column (of the average temperature) and 12,000 rows.

Right now, I have done the spatial averaging shown below (please assume I have already loaded in the data and declared names to the temperature and area variables used):

Also note that "area" variable is needed because all the grid boxes do not have the same surface area, and so I have to weight each box accordingly. The surface area for a given box changes with latitude but is independent of its longitude. The area of a given box is given as area[longitude,latitude].

######inputs######
# declare time range of interest
timecall1= 1
timecall2=12000

lat1=6 # latitude range to do average
lat2=7
lon1=5
lon2=6 # longitude range to do average
##################

k =lat1; sum=0;
while (k != lat2+1) {
sum=sum+sum(area[lon1,k]*temp[lon1:lon2,k,timecall1])
k= k+1;
}

avg_value=sum/(sum(area[lon1:lon2,lat1:lat2]))
Basically what I'm doing is averaging all the longitude values of temperature for a particular latitude, then moving up to the next latitude and repeating the process, etc.

I've checked the spatial averaging and it seems to be fine, though I come from a science and not a programming background, so if there is a more efficient way to do the above please let me know. I can imagine that with a data set this big such a calculation might get computationally expensive.

The last step is to make it a time series. The above program only works for a single month. It seems trivial but I can't get it. I've tried changing the value in the parentheses to "timecall1:timecall2" or adding in another loop, but it just ends up adding all the values in a way that I don't want.

Any input appreciated. Thanks!

2. ## Re: Producing time series of averages

My initial thought is to store this in R as a 3D array. It's like a matrix object, but it allows higher dimensions. In this case, you'll have spatial panels with another dimension of time. Each panel is a month. For prototyping, consider this

Code:
``````a <- array(dim = c(10, 2, 3))  # 10 rows by 2 cols by 3 panels
a[, , 1] <- matrix(sample(seq(100), 20))  # 1st panel
a[, , 2] <- matrix(sample(seq(100), 20))  # 2st panel
a[, , 3] <- matrix(sample(seq(100), 20))  # 3st panel``````
You'll get different results since I didn't set the see (set.seed(...)), but here's what I get

Code:
``````, , 1

[,1] [,2]
[1,]   52   19
[2,]   75   50
[3,]   47   66
[4,]   62    9
[5,]   74   32
[6,]   16   71
[7,]   76   97
[8,]   36   31
[9,]   57   94
[10,]   41   82

, , 2

[,1] [,2]
[1,]   61   32
[2,]   56   53
[3,]   76   25
[4,]   58   29
[5,]   71   95
[6,]   91   99
[7,]   35   62
[8,]   28   20
[9,]   57   47
[10,]    8   98

, , 3

[,1] [,2]
[1,]   48   78
[2,]   41    2
[3,]   27   17
[4,]   34   91
[5,]   90   16
[6,]   96   58
[7,]   59   39
[8,]   26   23
[9,]   93   31
[10,]   97   20``````
So what you're asking for is to take a subset of a panel and calculate its average for all 12,000 panels. The result can be stored simply as a 12,000 point vector in R, each position corresponding to its respective panel month. How would we simulate that in our prototype here?

Well, if I want to only look at rows 4 through 6, it's as easy as

Code:
``a[4:6, , ]``
I can of course include such subsets for the columns and the panel dimension. Both the syntax and concept here correspond to what you wanted. Of course, if we think of the rows in my array here as latitude and the columns as longitude, I don't have much to demonstrate! Still, suppose this was your data set and you wanted to look at the box you described. It would be as simple as

Code:
``a[10:30, 6:8, ]``
This would give us the box on each month. The question now is how do I take the average on each of the panels? Using my example above, we simply use apply on it with the MARGIN parameter on the panels

Code:
``apply(a[4:6, , ], 3, mean)``
This gives us

Code:
``[1] 44.00000 73.83333 64.16667``
If you took the 6 values on each of the panels of a[4:6, , ], you'll find that it matches those 3 vector point values. If you did this with the subset indexes for the 2 dimensions, you should end up with a vector 12,000 points long of the means you wanted.

The question you need to ask yourself now is how do you populate the array with the data you have. That, however, requires a bit more knowledge about how your data is stored and pulled into R. This is some pretty large data, but not the sort that require particularly special care. To save time, however, you will want to use R effectively. So, if you give us more details about your data files, we could prescribe a way to scan them into R to store them in a 3D array.

3. ## Re: Producing time series of averages

Thanks bryan,

I can test this and provide further details tomorrow when I am at work.

For now, I can elaborate by saying I am using netCDF files and using "ncdf" in order to manipulate the data within the R environment.

For example, I say temp = get.var.ncdf(nc,"tsurf") for the "tsurf" variable contained within nc. nc=open.ncdf(file name.nc). I'm not sure if you are familiar with ncdf or are looking for something more fundamental, but I was not involved with the construction of the netCDF files. I'm only bringing them into R to do some data analysis, but I think you can treat them as any arbitrary 144 x 90 x 12000 matrix....

4. ## Re: Producing time series of averages

I used to work next to an atmospheric scientist, so I know what netCDF is, but I don't know how that library pulls it into R. If you can take a snippet of the data and print the output from dput on that data object, we can dget to load it up into R ourselves. You can also show the print out of str(...) that should detail the structure of the object.

5. ## Re: Producing time series of averages

Here is some output. Not sure if this is what you are looking for (note this is 1200 not 12,000 months in length, since I have 10 files split up into centuries)...if not, let me know what you need me to do

6. ## Re: Producing time series of averages

dput(nc)
structure(list(id = 4L, ndims = 3L, natts = 5L, unlimdimid = 3,
filename = "tsurf_MON_1050-1149.E4rhLMgTck.nc", varid2Rindex = c(0,
0, 1, 2), writable = FALSE, dim = structure(list(lon = structure(list(
name = "lon", len = 144L, unlim = FALSE, id = 1L, dimvarid = 1,
units = "degrees_east", vals = structure(c(-178.75, -176.25,
-173.75, -171.25, -168.75, -166.25, -163.75, -161.25,
-158.75, -156.25, -153.75, -151.25, -148.75, -146.25,
-143.75, -141.25, -138.75, -136.25, -133.75, -131.25,
-128.75, -126.25, -123.75, -121.25, -118.75, -116.25,
-113.75, -111.25, -108.75, -106.25, -103.75, -101.25,
-98.75, -96.25, -93.75, -91.25, -88.75, -86.25, -83.75,
-81.25, -78.75, -76.25, -73.75, -71.25, -68.75, -66.25,
-63.75, -61.25, -58.75, -56.25, -53.75, -51.25, -48.75,
-46.25, -43.75, -41.25, -38.75, -36.25, -33.75, -31.25,
-28.75, -26.25, -23.75, -21.25, -18.75, -16.25, -13.75,
-11.25, -8.75, -6.25, -3.75, -1.25, 1.25, 3.75, 6.25,
8.75, 11.25, 13.75, 16.25, 18.75, 21.25, 23.75, 26.25,
28.75, 31.25, 33.75, 36.25, 38.75, 41.25, 43.75, 46.25,
48.75, 51.25, 53.75, 56.25, 58.75, 61.25, 63.75, 66.25,
68.75, 71.25, 73.75, 76.25, 78.75, 81.25, 83.75, 86.25,
88.75, 91.25, 93.75, 96.25, 98.75, 101.25, 103.75, 106.25,
108.75, 111.25, 113.75, 116.25, 118.75, 121.25, 123.75,
126.25, 128.75, 131.25, 133.75, 136.25, 138.75, 141.25,
143.75, 146.25, 148.75, 151.25, 153.75, 156.25, 158.75,
161.25, 163.75, 166.25, 168.75, 171.25, 173.75, 176.25,
178.75), .Dim = 144L), create_dimvar = TRUE), .Names = c("name",
"len", "unlim", "id", "dimvarid", "units", "vals", "create_dimvar"
), class = "dim.ncdf"), lat = structure(list(name = "lat",
len = 90L, unlim = FALSE, id = 2L, dimvarid = 2, units = "degrees_north",
vals = structure(c(-90, -87, -85, -83, -81, -79, -77,
-75, -73, -71, -69, -67, -65, -63, -61, -59, -57, -55,
-53, -51, -49, -47, -45, -43, -41, -39, -37, -35, -33,
-31, -29, -27, -25, -23, -21, -19, -17, -15, -13, -11,
-9, -7, -5, -3, -1, 1, 3, 5, 7, 9, 11, 13, 15, 17, 19,
21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47,
49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75,
77, 79, 81, 83, 85, 87, 90), .Dim = 90L), create_dimvar = TRUE), .Names = c("name",
"len", "unlim", "id", "dimvarid", "units", "vals", "create_dimvar"
), class = "dim.ncdf"), time = structure(list(name = "time",
len = 1200L, unlim = TRUE, id = 3L, dimvarid = -1L, vals = 1:1200,
units = "", create_dimvar = FALSE), .Names = c("name",
"len", "unlim", "id", "dimvarid", "vals", "units", "create_dimvar"
), class = "dim.ncdf")), .Names = c("lon", "lat", "time")),
nvars = 2, var = structure(list(axyp = structure(list(id = 3L,
name = "axyp", ndims = 2L, natts = 2L, size = c(144L,
90L), prec = "float", dimids = c(1, 2), units = "m^2",
longname = "gridcell area", dims = list(), dim = list(
structure(list(name = "lon", len = 144L, unlim = FALSE,
id = 1L, dimvarid = 1, units = "degrees_east",
vals = structure(c(-178.75, -176.25, -173.75,
-171.25, -168.75, -166.25, -163.75, -161.25,
-158.75, -156.25, -153.75, -151.25, -148.75,
-146.25, -143.75, -141.25, -138.75, -136.25,
-133.75, -131.25, -128.75, -126.25, -123.75,
-121.25, -118.75, -116.25, -113.75, -111.25,
-108.75, -106.25, -103.75, -101.25, -98.75, -96.25,
-93.75, -91.25, -88.75, -86.25, -83.75, -81.25,
-78.75, -76.25, -73.75, -71.25, -68.75, -66.25,
-63.75, -61.25, -58.75, -56.25, -53.75, -51.25,
-48.75, -46.25, -43.75, -41.25, -38.75, -36.25,
-33.75, -31.25, -28.75, -26.25, -23.75, -21.25,
-18.75, -16.25, -13.75, -11.25, -8.75, -6.25,
-3.75, -1.25, 1.25, 3.75, 6.25, 8.75, 11.25,
13.75, 16.25, 18.75, 21.25, 23.75, 26.25, 28.75,
31.25, 33.75, 36.25, 38.75, 41.25, 43.75, 46.25,
48.75, 51.25, 53.75, 56.25, 58.75, 61.25, 63.75,
66.25, 68.75, 71.25, 73.75, 76.25, 78.75, 81.25,
83.75, 86.25, 88.75, 91.25, 93.75, 96.25, 98.75,
101.25, 103.75, 106.25, 108.75, 111.25, 113.75,
116.25, 118.75, 121.25, 123.75, 126.25, 128.75,
131.25, 133.75, 136.25, 138.75, 141.25, 143.75,
146.25, 148.75, 151.25, 153.75, 156.25, 158.75,
161.25, 163.75, 166.25, 168.75, 171.25, 173.75,
176.25, 178.75), .Dim = 144L), create_dimvar = TRUE), .Names = c("name",
"len", "unlim", "id", "dimvarid", "units", "vals",
"create_dimvar"), class = "dim.ncdf"), structure(list(
name = "lat", len = 90L, unlim = FALSE, id = 2L,
dimvarid = 2, units = "degrees_north", vals = structure(c(-90,
-87, -85, -83, -81, -79, -77, -75, -73, -71,
-69, -67, -65, -63, -61, -59, -57, -55, -53,
-51, -49, -47, -45, -43, -41, -39, -37, -35,
-33, -31, -29, -27, -25, -23, -21, -19, -17,
-15, -13, -11, -9, -7, -5, -3, -1, 1, 3, 5, 7,
9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31,
33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55,
57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79,
81, 83, 85, 87, 90), .Dim = 90L), create_dimvar = TRUE), .Names = c("name",
"len", "unlim", "id", "dimvarid", "units", "vals",
"create_dimvar"), class = "dim.ncdf")), varsize = c(144L,
90L), unlim = FALSE, missval = 1e+30, hasAddOffset = FALSE,
hasScaleFact = FALSE), .Names = c("id", "name", "ndims",
"natts", "size", "prec", "dimids", "units", "longname", "dims",
"dim", "varsize", "unlim", "missval", "hasAddOffset", "hasScaleFact"
), class = "var.ncdf"), tsurf = structure(list(id = 4L, name = "tsurf",
ndims = 3L, natts = 2L, size = c(144L, 90L, 1200L), prec = "float",
dimids = c(1, 2, 3), units = "C", longname = "SURFACE AIR TEMPERATURE",
dims = list(), dim = list(structure(list(name = "lon",
len = 144L, unlim = FALSE, id = 1L, dimvarid = 1,
units = "degrees_east", vals = structure(c(-178.75,
-176.25, -173.75, -171.25, -168.75, -166.25, -163.75,
-161.25, -158.75, -156.25, -153.75, -151.25, -148.75,
-146.25, -143.75, -141.25, -138.75, -136.25, -133.75,
-131.25, -128.75, -126.25, -123.75, -121.25, -118.75,
-116.25, -113.75, -111.25, -108.75, -106.25, -103.75,
-101.25, -98.75, -96.25, -93.75, -91.25, -88.75,
-86.25, -83.75, -81.25, -78.75, -76.25, -73.75, -71.25,
-68.75, -66.25, -63.75, -61.25, -58.75, -56.25, -53.75,
-51.25, -48.75, -46.25, -43.75, -41.25, -38.75, -36.25,
-33.75, -31.25, -28.75, -26.25, -23.75, -21.25, -18.75,
-16.25, -13.75, -11.25, -8.75, -6.25, -3.75, -1.25,
1.25, 3.75, 6.25, 8.75, 11.25, 13.75, 16.25, 18.75,
21.25, 23.75, 26.25, 28.75, 31.25, 33.75, 36.25,
38.75, 41.25, 43.75, 46.25, 48.75, 51.25, 53.75,
56.25, 58.75, 61.25, 63.75, 66.25, 68.75, 71.25,
73.75, 76.25, 78.75, 81.25, 83.75, 86.25, 88.75,
91.25, 93.75, 96.25, 98.75, 101.25, 103.75, 106.25,
108.75, 111.25, 113.75, 116.25, 118.75, 121.25, 123.75,
126.25, 128.75, 131.25, 133.75, 136.25, 138.75, 141.25,
143.75, 146.25, 148.75, 151.25, 153.75, 156.25, 158.75,
161.25, 163.75, 166.25, 168.75, 171.25, 173.75, 176.25,
178.75), .Dim = 144L), create_dimvar = TRUE), .Names = c("name",
"len", "unlim", "id", "dimvarid", "units", "vals", "create_dimvar"
), class = "dim.ncdf"), structure(list(name = "lat",
len = 90L, unlim = FALSE, id = 2L, dimvarid = 2,
units = "degrees_north", vals = structure(c(-90,
-87, -85, -83, -81, -79, -77, -75, -73, -71, -69,
-67, -65, -63, -61, -59, -57, -55, -53, -51, -49,
-47, -45, -43, -41, -39, -37, -35, -33, -31, -29,
-27, -25, -23, -21, -19, -17, -15, -13, -11, -9,
-7, -5, -3, -1, 1, 3, 5, 7, 9, 11, 13, 15, 17, 19,
21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45,
47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71,
73, 75, 77, 79, 81, 83, 85, 87, 90), .Dim = 90L),
create_dimvar = TRUE), .Names = c("name", "len",
"unlim", "id", "dimvarid", "units", "vals", "create_dimvar"
), class = "dim.ncdf"), structure(list(name = "time",
len = 1200L, unlim = TRUE, id = 3L, dimvarid = -1L,
vals = 1:1200, units = "", create_dimvar = FALSE), .Names = c("name",
"len", "unlim", "id", "dimvarid", "vals", "units", "create_dimvar"
), class = "dim.ncdf")), varsize = c(144L, 90L, 1200L
), unlim = TRUE, missval = 1e+30, hasAddOffset = FALSE,
hasScaleFact = FALSE), .Names = c("id", "name", "ndims",
"natts", "size", "prec", "dimids", "units", "longname", "dims",
"dim", "varsize", "unlim", "missval", "hasAddOffset", "hasScaleFact"
), class = "var.ncdf")), .Names = c("axyp", "tsurf"))), .Names = c("id",
"ndims", "natts", "unlimdimid", "filename", "varid2Rindex", "writable",
"dim", "nvars", "var"), class = "ncdf")

7. ## Re: Producing time series of averages

And I'd still like to follow up on the original method I had started off with, even if it's not suitable for a large dataset...just for my own education.

8. ## Re: Producing time series of averages

Okay. Don't ever do your original method. Done. You don't iterate over vectors in R. It's slow as hell. You use vectorized operations. Just compare the speed of these two commands

Code:
``````x <- rnorm(10000000)
sum(x)  # Instant! Vectorized functiion

y <- 0
for(i in 1:length(x)) y <- y + x[i]  # SLOOOOW! Don't use loops``````

9. ## Re: Producing time series of averages

That's an ugly data set, but the values are accessible. It looks to me like you'll want to first extract the values and form your matrix. I'd focus on one decade file at a time. I'll have to look at this data more deeply to figure out exactly how it links together. You might consider looking at some of the spatial packages (e.g., sp), as they may have methods for handling this sort of data better.

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts