Association/ correlation binary and continuous data non-normal distribution

#1
Hi,
I want to perform an association analysis between a binary (dichotomous?) variable and continuous variable. The binary variable is whether something is present or not, "Yes" or "No"/ "1" or "0" .... etc. The continuous variable are numbers between ~150 and 170~. The continuous variable is not normally distributed. There are more low than high values. My question of interest is whether there is a correlation between either high or low values of the continuous variable and the 1 or 0 of the binary variable. So, do low values correlate with "1"? My sample size is ~150

I have tried a point biserial correlation test and a sperman's rho test so far. I'm not sure if any of them is the right one. Can someone give me an advice on this?

Many Thanks!
 

Karabiner

TS Contributor
#2
Compare the means of your continuous variable between the "yes" and "no" group (t-Test, or rather Welch test).

With kind regards

Karabiner
 

hlsmith

Omega Contributor
#3
Yeah, I was going to propose the Wilcoxon rank sum. Is your continuous variable bound between ~150 and 170~ or was that just where most landed?
 
#4
Thank you!

The continuous variable is bound between ~150 and 170. If this is is problem I can change the numbers to ~0-20 but I don't think it is.

A t-test and a point biserial correlation test are basically the same thing, is that right? By applying a t-test I get really low p-value for every case I'm testing.
 

hlsmith

Omega Contributor
#5
I was checking with the bounding, because if a continuous variable is bounded, then many times you can get confidence intervals that span a greater range than is allowable, e.g., say 99% bound by 100% and 95% CIs are 94% to 109%, which may be non-sensical (sp?).

What are you trying to say with the results. Also, can the continuous variable be non-intergers, e.g., 156.89?


Given your data, I would think an exact (monte carlo) Wilcoxon rank sum test would be appropriate. There is another person on this forum that would likely also recommend perhaps a permutation test, based on say the t-test framework.
 
#6
the continuous variable are days of the year. The event whose occurence I am tesing based on condition "0" or "1" can occur approximately between 150 and 170 days after January 1st. Does this mean I have ties to my data? It can also be floats as I am also working with the mean value over several years.
With the result I am trying to tell if condition "1" leads to lower values of the continuous variable. So if condition "1" occurs, wether the event I am testing occurs earlier.