Small sample size, correlation

TS Contributor
Ok, I am helping someone to run spss on his data,He just wants to know the results and not advice.But still I am hating my gut because his method doesnt seem right.
He has a sample of 3 companies,each company has 3(yearly) values so total 9 values and he wants to find the correlation with 9 corresponding values, I am not in liberty to discuss his data so just say he has 9 x and 9 y values(and this is a dissertation paper).
His 3 hypothesis show that there is no correlation, but r shows a high correlation( due to outliers and small sample size I guess).
I told him politely that your data cant support the results you have,like I dont feel like we can draw conclusion on a real life phenomena on just 9 values.
He wants me to use 5% level of significance and use pearsons correlation.

So my question being, is it the correct method that he is following?what is the statistical take on this so I can talk him into getting more data?

Though it is not my problem , but I feel like I should tell him before hand.
Thanks

CowboyBear

Super Moderator
Am I interpreting this correctly in thinking that he wants to show that there is no correlation between the variables? I think he really needs to know that not finding a statistically significant correlation is NOT evidence that no correlation exists. It merely indicates that one has not managed to find evidence to reject a null hypothesis of no correlation (and we are unlikely to find such evidence with such a small sample).

Trying to make a claim that no correlation exists because a correlation coefficient calculated with a sample size of 9 is not statistically significant is nonsensical; I agree that this is inappropriate. More formally, the problem is inadequate statistical power. Using this calculator, we can see that even if the true population correlation between the two variables is 0.5 (a quite strong relationship in the social sciences), the probability of observing a statistically significant relationship at the .05 alpha level with a sample size of 9 is just 29.5%!! He needs to perform statistical power analysis, decide on a more appropriate sample size, and get more data.

Another issue is that his datapoints are unlikely to be independent, given that 3 of each come from the same company. Independence problems and other possible issues are quite hard to judge given that we don't know much about the project/data, though.

TS Contributor
his null is there is no correlation between x and y, and alternative being there is a positive correlation.
I told him that he needs a rather large data to deduce any results, but he says that my professor told me that since you do not have much time you should go with the data you already have. :|

CowboyBear

Super Moderator
Gosh. I think his professor should rethink that opinion, but I guess there's not much you can do if that's what they're set on. :shakehead

TS Contributor
yea, guess so,
But what if he listens to me and rethinks, what do you suggest that how he should go about it?
another thing it is not ordinal data its just a variable data on interval/ration scale.

Last edited:

CowboyBear

Super Moderator
Good that it's interval data - pearsons correlation assumes interval data. Really, he needs more data, simple as that - he should use power analysis to decide how much. Determining for sure whether pearsons correlation is the best analysis method is a bit difficult without knowing more about the data and substantive research question - he will need to check that its assumptions are met. There's no obvious reason to suggest a different method though. He just needs more data.

Dason

Has a scatterplot been produced? Does a linear fit even seem plausible? Would something like spearman's make more sense?

Also I'd like to second CowboyBear's worry that the data might not be independent.

TS Contributor
Welcome back dason
Yes I produced scatter plots and they do not show much correlation infact there are huge outliers.
Data is independent but it is not nominal data so cant use spearmans.
Now he wants me to give him a reference that why due to outliers in scatter plot he cant deduce the results.
Got any?

TS Contributor
I cant make any sense of the scatter plot, and decide since it has only 9 values and most of them are rather extreme.

Dason

If there are only 9 data points (or even in general) how can most of the points be extreme?

TS Contributor
lets say in any data (not the one I am talking about)
I have 2, 300,335,336, 7000,11000.
I think I have 3 extreme values, no?