Association/ correlation binary and continuous data non-normal distribution

I want to perform an association analysis between a binary (dichotomous?) variable and continuous variable. The binary variable is whether something is present or not, "Yes" or "No"/ "1" or "0" .... etc. The continuous variable are numbers between ~150 and 170~. The continuous variable is not normally distributed. There are more low than high values. My question of interest is whether there is a correlation between either high or low values of the continuous variable and the 1 or 0 of the binary variable. So, do low values correlate with "1"? My sample size is ~150

I have tried a point biserial correlation test and a sperman's rho test so far. I'm not sure if any of them is the right one. Can someone give me an advice on this?

Many Thanks!


TS Contributor
Compare the means of your continuous variable between the "yes" and "no" group (t-Test, or rather Welch test).

With kind regards



Omega Contributor
Yeah, I was going to propose the Wilcoxon rank sum. Is your continuous variable bound between ~150 and 170~ or was that just where most landed?
Thank you!

The continuous variable is bound between ~150 and 170. If this is is problem I can change the numbers to ~0-20 but I don't think it is.

A t-test and a point biserial correlation test are basically the same thing, is that right? By applying a t-test I get really low p-value for every case I'm testing.


Omega Contributor
I was checking with the bounding, because if a continuous variable is bounded, then many times you can get confidence intervals that span a greater range than is allowable, e.g., say 99% bound by 100% and 95% CIs are 94% to 109%, which may be non-sensical (sp?).

What are you trying to say with the results. Also, can the continuous variable be non-intergers, e.g., 156.89?

Given your data, I would think an exact (monte carlo) Wilcoxon rank sum test would be appropriate. There is another person on this forum that would likely also recommend perhaps a permutation test, based on say the t-test framework.
the continuous variable are days of the year. The event whose occurence I am tesing based on condition "0" or "1" can occur approximately between 150 and 170 days after January 1st. Does this mean I have ties to my data? It can also be floats as I am also working with the mean value over several years.
With the result I am trying to tell if condition "1" leads to lower values of the continuous variable. So if condition "1" occurs, wether the event I am testing occurs earlier.