Spearman's correlation: a continuous with a binary variable

#1
Hi everyone. I'm pretty new to statistics so sorry for the dummy question :eek:

I need some nonparametric statistics to correlate a binary variable with a continuous one. Binary variable is yes or no.
For instance:
- one variable is a blood test score (ranging from 0 to 100);
- the other variable is to say if a patient suffer from a specific disease, so it's a yes or no variable.
I want to find out if high test scores correlate with the fact of having a certain disease.

I have to use nonparametric statistics.

Is there any way I can use Spearman's correlation to do this?
Is it ok to use 1 = yes and 0 = no, so as to have a sort of rank?
Yes/no looks like something nominal and I know variables should be at least ordinal in a Spearman's Correlation.

Thanks a lot for your help!

:)
 

Karabiner

TS Contributor
#2
I need some nonparametric statistics to correlate a binary variable with a continuous one.
What is your exact research question? Is it really necessary
to calculate a correlation coefficient?

I have to use nonparametric statistics.
What is the reason for that? And do you have to perform
a test of significance, not just calculation of a correlation
coefficient?
Is there any way I can use Spearman's correlation to do this?
IIRC Somer's D can be used for correlating a binary with
a rank ordered variable.

With kind regards

K.
 
#3
Thank you Karabiner :)

What is your exact research question? Is it really necessary
to calculate a correlation coefficient?
Well I guess so. I need to know if the test score is high for those who are sick.
I use it together with Chi square, Goodman Gamma and U Mann-Withney. All the tests confirming the association.

What is the reason for that? And do you have to perform
a test of significance, not just calculation of a correlation
coefficient?
My data is not normally distributed and relation is probably not linear.
Yes, I also need singificance scores for each test (I use SPSS or R to calculate them).

IIRC Somer's D can be used for correlating a binary with
a rank ordered variable.
One variable is binary and the other is continuous.
Is Somer's D a good substitute of Spearmans' rho? How do I interpret it?
What's the difference (not in the calculus, but in the meaning of the statistic)?

Thanks again!!
 

Dragan

Super Moderator
#4
The correlation between a continuous and binary variable is referred to as a Point-Biserial Correlation. (It's a special case of the formula associated with the Pearson product-moment coefficient of correlation as is the Spearman rank correlation is - assuming there are not tied scores.) It is logically equivalent to a t-test or One-Way ANOVA with two groups, which tests the null hypothesis that two population means are equal. This correlation is computed as follows:

r = {(M_0 - M_1) / (Sqrt [ SS_y / N] )} *Sqrt [p*q]

where r is the Point-Biserial Correlation, M_0 is the mean of group assigned 0's, M_1 is the mean of the group assigned 1's, SS_y is the total sum of squares for the continuous variable, N is the total sample size, p is the proportion of cases assigned 1's, and q is the proportion of the cases assigned 0's.
 
#5
Thank you Dragan.

However, I see Point Biserial Correlation is a parametrical test and I need non-parametric statistics. Any alternative for my case, to use together with U Mann-Whitney, Chi Square and Goodman Gamma?
 

Dragan

Super Moderator
#6
Thank you Dragan.

However, I see Point Biserial Correlation is a parametrical test and I need non-parametric statistics. Any alternative for my case, to use together with U Mann-Whitney, Chi Square and Goodman Gamma?
Well, yes, use the Rank Biserial Correlation. It is computed as follows:

r = (2/N) * (Ybar_1 - Ybar_0)

where N is the total number of observations, Ybar_1 is the mean Rank of scores assigned 1's and Ybar_0 is the mean Rank of the scores assigned 0's.

For a reference on this use: E. E. Cureton (1956) "Rank Biserial Correlation", Psychometrika, 21, pp. 287-290.
 
#7
Great! Would you be so kind to also tell me how do I calculate it in R or SPSS? It's easy to calculate the statistic, but the p-value?
I saw this answer of you but didn't get the point:
http://www.talkstats.com/showthread...int-biserial-correlation-coefficients-in-SPSS

So, in your opinion, if use U Mann Whitney + Chi sqaure + Goodman Gamma + Rank biserial correlation and all are significant.. Can I say that a higher test score is in someway (not linear) associated with the fact of being sick?

Have a last frightening doubt: is it ok to use U Mann-Whitney and Goodman Gamma, when I have a binary nominal variable?? (Like being sick or not).
 
Last edited:

Dragan

Super Moderator
#8
As I stated in the link you provided, the Rank-Biserial Correlation is a linear function of the U statistic. So you can use a function the U statistic to compute critical values.

To do this, see the following article: Willson, V. L. (1976). Critical Values for the Rank-Biserial Correlation Coefficient. Educational and Psychological Measurement, 36, pp. 297-300.

For the case where you have tied scores, see the following article: Cureton, E. E. (1968). Rank-Biserial Correlation when Ties are Present. Educational and Psychological Measurement, 28, pp. 77-79.

As you can see, both of these articles are short and directly address the concerns you are raising.
 
#9
Thanks! I'm hoping to find a free version of the articles you suggested and will read them.

Now, since I have a continuous (or ordinal) and a binary nominal variable, I guess also Goodman Gamma and U Mann-Whitney cannot be used. Am I right?

I was planning to substitute Goodman Gamma with Goodman Lambda, which should be ok when comparing binary nominal and ordinal data. What do you think?

Any alternative to U Mann-Whitney to compare a binary nominal and an ordinal/continuos variable?
 

Karabiner

TS Contributor
#10
Now, since I have a continuous (or ordinal) and a binary nominal variable, I guess also Goodman Gamma and U Mann-Whitney cannot be used. Am I right?
Mann-Whitney U test can be used when 2 groups (here: disease yes/no)
are compared with regard to an interval scaled or an ordinal dependent
variable.

With kind regards

K.