+ Reply to Thread
Results 1 to 3 of 3

Thread: Is there a Correlation metric for Categorical vs Numerical features?

  1. #1
    Points: 16, Level: 1
    Level completed: 31%, Points required for next Level: 34

    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Is there a Correlation metric for Categorical vs Numerical features?




    I've been searching for some time for a correlation metric analogous to the Pearson correlation value for numerical vs numerical features, or Cramér's V for categorical vs categorical features, but this time for categorical vs numerical features.

    This is my toy data example in Python, where the categorical variable is not ordinal and notice that the number of observations per class of the categorical feature is not the same:

    pd.DataFrame({'numerical': np.array([19, 27, 31, 26, 39, 43, 32, 29, 19, 19, 27, 31]),
    'categorical': np.array(['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'C'])})

    I've seen a lot of answers referring Interclass Correlation (but I don't have a square matrix and also I don't have subjects being analysed by several judges...). Also, I've seen that the use of one-way ANOVA is also frequent, but it does not solve the problem because it does not translate in a clear strength of association coefficient as Pearson.

    Can you suggest a metric or it is impossible to have one for this case?

  2. #2
    TS Contributor
    Points: 17,949, Level: 85
    Level completed: 20%, Points required for next Level: 401
    Karabiner's Avatar
    Location
    FC Schalke 04, Germany
    Posts
    2,561
    Thanks
    56
    Thanked 644 Times in 606 Posts

    Re: Is there a Correlation metric for Categorical vs Numerical features?

    ANOVA gives you R² as measure of strength of association.

    With kind regards

    K.

  3. #3
    Points: 16, Level: 1
    Level completed: 31%, Points required for next Level: 34

    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Is there a Correlation metric for Categorical vs Numerical features?


    Thanks for the suggestion Karabiner.

    Unfortunately, I can't see a way to make the ANOVA give the R2. Are you sure? From what I understand it gives you F-statistic and a p-value, and nothing like a correlation value. I recall, I'm talking about a possible metric to calculate the strength of association (analogous to Pearson correlation coefficient) between a nominal/categorical variable with more than two unique values ('A', 'B', 'C, ...) and a continuous/numerical variable (e.g., Age or income).

    Did I make myself clear with this explanation?
    Last edited by hlopes; 10-22-2017 at 04:48 PM. Reason: typo

+ Reply to Thread

           




Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats