Hello, I have a question about the cross-correlation.
I have 2 data sets, one is binned and another is not, 9000 values every. Something like this:
a = [23 45 88 9 0 29 30 60 0 2 10 100 0 60 90] b = [0 0 0 1 1 0 0 0 0 0 0 1 0 0 1] Actually, the "b" is the set of events (yes/now) which occurs during a continuous change of the intensity of a "signal" (the signal has values [0-100]). I performed the cross-correlation, because I want to find if the occurance of the events has something in common with the changes of a signal.
So I did a cross-correlation in Matlab and used the formula:
maxlags = 300;
[c, lags] = xcorr(a-mean(a), b-mean(b), maxlags, 'coeff')
(I subtracted the mean and used the 'coeff' option to get values between -1 and 1)
And also created a 10-times shuffled null-hypothesis, cross-correlated it as well and calculated the t-score values.
So finally I've got such a thing:
http://postimg.org/image/m5rddi5vh/
At first I tried to do the correlation just with to correlate the data without substracting the mean, but in this case it gives me a very strange null-hypothesis (it's in red, the real correlation - in blue) (the changes are more evident with bigger maxlags):
http://postimg.org/image/jiopjheap/
it's clearly a triangle. I did some research in intrnet and even found some similar questions but the answers were, that these is because the "zero-padding" and that's why we should subtract the mean before the correlation.
Basically, the cross-correlation wit the subtracted mean already corresponds to the xcov function of Matlab - which is already the cross-covariance... (I also have a colleague who suggested me to do the cross-correlation and afterwards to substract the mean of the obtained value, like this:
[c, lags]= xcorr(a,b)
x_crr=c-mean(c)
-but I don't really see why I should do it and, besides, it still gives me the same strange triangle-shaped null-hypothesis.)
Here is my question: should I really subtract the mean before the correlation in this case or it's wrong and I have to check for another option?
I've read already so many sources and explanations but still am not sure about this concrete case (the majority use as examples sets of values with similar scales - not the binned one vs the usual one.
Thank you in advance for your answer.
Best regards.
Tweet |