I have satellite imagery for a specific 10km by 20km location for a specific time of year over an 11 year period (2000 to 2010). So 11 images in total. Each image is made up of pixels 250m2 in size which have a value for "greenness" (NDVI).

I want to compare greenness in the most recent year with the preceding 10-years.

It's been suggested that I use a z-test to compare the mean greenness for 2010 with the aggregate mean for 2000 to 2009.

It's also been suggested that I could implement a regression with year as a factor.

I've looked all over the place but can't seem to work out exactly if I should be doing either of these or something else entirely. Any advice would be gratefully received.

I've done a bit more reading and it appears that a "repeated measures ANOVA" might be more appropriate. Rather than comparing the aggregate mean for 10 years with 2010 I look for differences in means over time in a population which is non-independent (i.e. in the same location each year). Does that make sense?

To me it seems more natural with a linear regression model.

To just test one single value, the last year, with a mean of the rest does not seem as natural for me. (It would not be OK with a z-test unless you know the variances and so on.)

Regression seems ok. I guess that you expect a trend in the data. (Is it because of the green house effect?)

I don’t think you need to start with thinking of this in terms of “repeated measures anova”. (That stuff is about some other aspects.) Of course you do repeated measures but what you want is simply the time trend – if I have interpreted you correct. Don’t start with making it unnecessarily complicated. But you must look for autocorrelation in the residuals.

Of course you can have many sorts of non-linearity in the trend. But the natural first thing seems to be to simply plot the data, the eleven yearly points and look at it, and choose a model from that.

You might also need to transform the “y-variable”, the greenness variable. The usual assumption is the dependent variable is normally distributed, given the “x-variable” (here the time variable) but it could also have other distributions (like the gamma distribution for example in a generalized linear model).

A bigger difficulty is maybe about how to aggregate the many pixels from one image to one value for that year. You have about 800.000 pixels (10.000*20.000/250), don’t you? Without going into multi-level models, maybe it is possible to calculate a trimmed mean by throwing away the 10% highest numbers and the 10% lowest. That would make the estimate more robust to small fractions of outliers.

Hopefully someone else will comment on this. Unfortunately, on this site, it seems like if one person has responded then few others - if any – respond. There are so many here on this site that knows so much more than I do. We need to listen and learn from each other.
Hi Greta,

Firstly, thank you very much for your help.

I don't actually expect a trend in the data (too short a time-frame). And I'm not actually looking for a trend.

An event happened in 2010 which might be tied to the greenness of the vegetation in that year. I want to be able to say if the greenness of the vegetation did in fact differ significantly in 2010.

Eyeballing the data (see attached jpeg boxplot) the vegetation is not different but I'm not sure how to prove this statistically.

I've actually got the number of pixels down to a manageable number and extracted the values so have a series of vectors to work with. In which case I do know the variances so would a Z-test be applicable?
There are two sources of variation, one within the image – that’s what we can se in each box – and the variation between years. And it is the latter you need to know, or rather estimate.

To do a z-test you will just have one observation (the 2010 –value) and only ten other values.

You could do a t-test with n1=1 and n2=10. That is not very much so I guess that the power will be very low, but of course you can do it.


Less is more. Stay pure. Stay poor.
Given your describe scenario, I agree with Greta and the possibility of doing a t-test. If the images were taken for the same locations using the same methods - a possible other option could be the Sign Rank Test. To do this you could determine the prior ten year averages as a group minus the 11th year averages and then test that the difference is not "0". Not completely sure what the dataset you are using looks like, so the t-test seems to be the option (I was just try to acccount for the possible similarities between the datasets, since these are not two idenpendent samples).