I know you're supposed to observe a scatterplot first in order to see if the shape is monotonic or linear to determine which test to use...but this graph has me baffled. Can anyone advise on which test to use based on this scatterplot? Thank you.
I have always been taught that the key to decide if you use pearson or spearman's is not if the results are linear, but if the data is interval (pearson) or ordinal (spearman) - although you can if you want use spearman for interval data. It is true that pearson's r won't work with non-linear data (and yours looks non-linear to me), but I ran across disagreement on whether spearman's deals well with it or not. Some said that the data must be roughly linear to use spearman, others that it only had to have a monotonic not linear relationship.
That data does not look very linear to me . For this scatterplot, I'd expect a higher correlation using Spearman than when using Pearson, but I'd be a bit skeptical on how to interpret the results. It looks almost like two datasets to me: one up to 2000 on the y-axis, which is spread out over the x-axis, and one around 100% BMI.
Please ignore if this is not helpful, but do you need to correlate these exact two quantities? Why do the BMI percentages heap up so strongly around 100% BMI?
I think, in agreement I believe with the post immediately above me, that the data approaches some limit at which point the previous distribution becomes fundamentally different. I am not sure it even makes sense to analyze what is almost a constant at that point. You might comment out that section of the data and just analyze the rest of it.