+ Reply to Thread
Results 1 to 3 of 3

Thread: Correlation to determine the best reference series for homogenization

  1. #1
    Points: 47, Level: 1
    Level completed: 94%, Points required for next Level: 3

    Thanked 0 Times in 0 Posts

    Correlation to determine the best reference series for homogenization

    Before asking this, I read similar questions, but none of them lead to satisfying answer for my specific interest.

    I want to homogenize a 64 years (1940-2003) climate time series of precipitation of Dominican Republic. For that, it is really important to select a reference series among a group of candidates.

    Let's say "sjo" is the base series, for which I want to find a good reference series; "bani", "plc" and "ra" are reference candidates, because they are close to "sjo". In the jpg attached map, the red point is the base station, and the green ones are the reference candidates:

    I made three correlation analysis (done in R, function cor()), considering this monthly variables: raw precipitation value, normalized difference, and transformed values with Box-Cox. Those variables correspond, respectively, to fields that begin with "p", "dian" and "pnorm".

    Normalized difference comes from the first difference series method (FDM), which was proposed by Peterson, consisting of:
    [Pm(t) - Pm(t-1)] / [Pm(t) + Pm(t-1)],
    where Pm(t) is the precipitation value for the month m, and Pm(t-1) is the precipitation for the same month 1 year before. I followed Peterson et al. (1998) remark, which says that FDM applied to precipitation might work better using normalized difference.

    As can be seen in page 1 the attached PDF, correlation was calculated for the whole time series (1940-2003). For raw precipitation and Box-Cox transformed values, "bani" is the best correlated with "sjo" (yellow background cells shows the maximum correlation index). Notice that for raw precipitation, "bani" is significantly more correlated than others. For normalized difference, "ra" is only a bit more correlated than the rest. However, each candidate station has statistically significant correlation index with "sjo" at a 0.05 significance level, suggesting ANY of them could be used as a reference series.

    This is a bit confusing, so, I was unsatisfied and decided to make a more detailed analysis, spliting the series in 5 years periods intervals, and evaluating correlation for between series for the same 3 variables: raw precipitation, normalized difference and Box-Cox transformed.

    Tables from page 2 to 8 in the attached PDF, show the results of these partial correlations; the last page summarizes the times each station has had the maximum correlation value for each variable. As can be seen, "bani" is the most frequently correlated value for the 3 variables analyzed (in all cases, more than 7 times of the twelve 5-years periods analyzed).

    With these results, I think that "bani" is the best candidate as a reference series of "sjo", but I'm not sure about it. Is the five-years period analysis OK? Should I accomplish some other analysis?


    Attached Thumbnails Attached Thumbnails Click image for larger version

Name:	mapa_estaciones_climaticas_ocoa_y_entorno_2.jpg‎
Views:	49
Size:	323.1 KB
ID:	4239  
    Attached Files
    Last edited by noviolencia; 03-02-2014 at 03:09 AM.

  2. #2
    Points: 2,468, Level: 30
    Level completed: 12%, Points required for next Level: 132

    Thanked 7 Times in 7 Posts

    Re: Correlation to determine the best reference series for homogenization

    Hi nonviolencia,

    personally i think your point of view with working with correlations is one of the right ways to find a reference for jose de ocoa. Yet you seem to struggle with finding an argumentation for your decision to make "bani" your reference point.

    Let me try to give another possible way of basing your decision on facts.

    I would suggest doing a linear regression (if your data is normaly distributed or "nearly is"). Let the raindata for jose be the dependant variable. The independent therefore are the values of the other three stations. Check the p-values for each station and also look at the standardized regression coefficients . I think for bani the standardized regression coefficient should be the highest - leading you to the conclusion, that it might be the best reference. As you are using R i will post an example with random values.

    #Generation of totally randomized Rain-values. Lets pretend it liters per year
    year<-seq(1940,2003,1)# years->use your own data
    sjo.rain<-arima.sim(n=64, list(ar=c(0.999999)))# randomized->use your own data
    pbani.rain<-arima.sim(n=64, list(ar=c(0.999997)))# randomized->use your own data
    plc.rain<-arima.sim(n=64, list(ar=c(0.999991)))# randomized->use your own data
    pra.rain<-arima.sim(n=64, list(ar=c(0.999992)))# randomized->use your own data
    #Build data.frame
    # Correlation matrix
    par(mfrow=c(2,2))#visualize the correlations
    # Linear Regression per "Reference point" - normal distribution is not always given but this is just for showcasing!
    #Calculation of the standarized regression coefficient 
    library(QuantPsyc)#install package
    lm.beta(fit)#get standardized regression coef.
    #in this example with randomized values pbani.rain has the highest "influence" on the "depending" variable sjo.rain.Therefore it should be used as a reference
    You may of course also use simple t-test for finding significant differences in the distribution of mean of all variables. Here i think you should look for the lowest t-value (hence highest p-value) to look for non-significant differences between the stations as they (low t, high p) may be an indicator for similarity of mean.

    With best regards


  3. #3
    Points: 47, Level: 1
    Level completed: 94%, Points required for next Level: 3

    Thanked 0 Times in 0 Posts

    Re: Correlation to determine the best reference series for homogenization

    Thanks Sebastian for this detailed answer. Sure I'll try these steps, and will let you know.

    Also, I invite you to see a discussion on this same topic in another forum...


    ...where you'll see that I applied t-test to asses the differences between correlation values of pairs of variables. But your suggestion, of doing a t-test between the same variable of two stations, is a good idea and will try it.



+ Reply to Thread


Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Advertise on Talk Stats