Non-parametric correlation help!

#1
I need to look for a correlation between a continuous variable (volume) and a dichotomous variable (metastasis). Can anybody suggest the correct test?
 

Dragan

Super Moderator
#2
I need to look for a correlation between a continuous variable (volume) and a dichotomous variable (metastasis). Can anybody suggest the correct test?
You could use a point-biseral correlation.

It is computed as follows:

r = ((YBar1 - YBar0)/Sigma)*Sqrt[p*q]

where

YBar1 is the mean of the Y scores associated with 1's

YBar0 is the mean of the Y scores associated with 0's

Sigma is the standard deviation of all the Y scores; Sqrt[ (SumY^2 - ((SumY)^2)/N)/N ]

p is the proportion of scores associated with the 1's

q is the proportion of scores associated with the 0's
 

Dragan

Super Moderator
#4
Is the point-biserial still valid if the continuous data is not normally distributed (ie Shapiro-Wilk is significant)?

In a descriptive sense - yes. I forgot to mention that the point-biserial correlation is just a special case of the more general Pearson product-moment correlation coefficient. That is, if you apply the usual Person formula on the data you will get the same answer using the forumula I provided above.

Now, in terms of hypothesis testing, the normality assumption can become an issue if your sample sizes are small. A nice way to look at this is that the hypothesis being tested is the same as a two-independent samples t-test i.e. Mu1=Mu2 and you'll get the exact same t statistic that you would obtain using t=(YBar1=Ybar)/StdError. Or, you would also get the same t statistic if you regressed Y on the dichomotous variable t=b/StdError where "b" is the regression weight which will be the difference between the two means (YBar1 - Ybar0).


Does this help?
 
#5
Hmm. Thanks for the reply. However, I'm still a bit unsure. Unfortunately my data set is small. I was also under the impression that the Pearson was for parametric data. That leads me to two further questions:

1) Is it valid to use the Spearman corellation test then convert to a point biserial in the usual fashion, thus overcoming problems with non-parametric data?

2) To take the independent t-test analogy further, would it be valid to use a Mann-Whitney Test to look for differences in the values of the continuous volume data in each of the dichotomous groups?
 

Dragan

Super Moderator
#6
Hmm. Thanks for the reply. However, I'm still a bit unsure. Unfortunately my data set is small. I was also under the impression that the Pearson was for parametric data. That leads me to two further questions:

1) Is it valid to use the Spearman corellation test then convert to a point biserial in the usual fashion, thus overcoming problems with non-parametric data?

2) To take the independent t-test analogy further, would it be valid to use a Mann-Whitney Test to look for differences in the values of the continuous volume data in each of the dichotomous groups?

(2) Yes, I believe the Mann-Whitney test is a good choice in view of your concerns.


Note: The Spearman rank correlation, like the Point-Biserial correlation, is a special case of the Pearson correlation. More specifically, suppose we have two variables (X and Y) and we take the ranks of each (denoted as RY and RX). In the absence of tied scores, the Pearson correlation will equal the Spearman correlation. The difference is in the interpretation. That is, the Pearson correlation is an index of the Linear association between X and Y whereas the interpretation of the Spearman correlation is "weaker" - and index of the Monotonic relationship (not necessarily linear).

These two correlations will be close to each other. Specifically, if we have Large samples where X and Y are both normally distributed, then the relationship is as follows:

rs = (6/Pi)*ArcSin[rp/2],

where rs and rp are the Spearman and Pearson correlations, respectively.
 

mcw

New Member
#7
non parametric correlations

I too am attemtping to correlate a dichotomous variable (self harm yes/no) with several continuous quantitative variables (e.g. depression, anxiety). The data is not normally distributed (significant kolmogorov-s and shapiro tests), and hence I have employed non-parametric tests elsewhere. I am using SPSS (15).

1) I wondered if I could use Pearson's to obtain point biserial coefficients if the data is non parametric?

2) Failing the above, how do I (quickly) obtain rank point biserial correlations using SPSS?

One paper suggests using Mann-Whitney tests to get the mean rank diff scores and the signficance levels. The biserial coefficient correlations are then to be computed by multiplying the mean rank difference scores by two and dividing by the sample number (Chmura Kraemer, 2006; Nichols, 2000). This seems quite time consuming given the number of variables I have. Any thoughts or advice appreciated. 3 different samples of N=79, N=238, N=317
 
Last edited: