Two quick questions about correlation coefficients

#1
Hi, I'm new here, so my apologies in advance if I've put this in the wrong thread, or I'm breaking any rules.

I'm doing an analysis that compares pre-employment assessment scores to on-the-job performance metrics. My data set has thousands of records. When I look at simple correlation coefficients between the two, the results are close to zero correlation.

However, if I average the on-the-job metrics by each possible assessment score (roughly 100 discrete possible scores), and then run a correlation coefficient between the 100 assessment scores and the average on-the-job performance metrics for each, the resulting correlations are much higher. So my first question is, am I allowed to do this and have it still be considered statistically valid?

The other quirk about this data is that the assessment scores are heavily skewed towards high grades. So, when I do this correlation coefficient of the average on-the-job metrics, there are some outliers of low assessment scores, that only happened a few times out of thousands of assessments taken. So, when I take this data and do a weighted correlation coefficient, now the result I'm getting is very very high, over 0.8. So my second question is, is using a weighted correlation coefficient in this manner also statistically valid?

Thank you to anyone who can help me here!

-Greg
 

Miner

TS Contributor
#2
Two possibilities are having a lot of scatter in the data and a nonlinear relationship. Have you plotted the raw data using a scatter plot? Does it look like a shotgun blast pattern? A nonlinear relationship (try a Spearman's rho correlation)?