I've been searching for some time for a correlation metric analogous to the Pearson correlation value for numerical vs numerical features, or Cramér's V for categorical vs categorical features, but this time for categorical vs numerical features.
This is my toy data example in Python, where the categorical variable is not ordinal and notice that the number of observations per class of the categorical feature is not the same:
pd.DataFrame({'numerical': np.array([19, 27, 31, 26, 39, 43, 32, 29, 19, 19, 27, 31]),
'categorical': np.array(['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'C'])})
I've seen a lot of answers referring Interclass Correlation (but I don't have a square matrix and also I don't have subjects being analysed by several judges...). Also, I've seen that the use of one-way ANOVA is also frequent, but it does not solve the problem because it does not translate in a clear strength of association coefficient as Pearson.
Can you suggest a metric or it is impossible to have one for this case?
This is my toy data example in Python, where the categorical variable is not ordinal and notice that the number of observations per class of the categorical feature is not the same:
pd.DataFrame({'numerical': np.array([19, 27, 31, 26, 39, 43, 32, 29, 19, 19, 27, 31]),
'categorical': np.array(['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'C'])})
I've seen a lot of answers referring Interclass Correlation (but I don't have a square matrix and also I don't have subjects being analysed by several judges...). Also, I've seen that the use of one-way ANOVA is also frequent, but it does not solve the problem because it does not translate in a clear strength of association coefficient as Pearson.
Can you suggest a metric or it is impossible to have one for this case?