+ Reply to Thread
Results 1 to 5 of 5

Thread: High Correlations with Very Different Distributions..?

  1. #1
    Points: 3,021, Level: 33
    Level completed: 81%, Points required for next Level: 29

    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Discounting Correlations.. Alternative Descriptives?




    I am comparing a similar calculation for the same sample via two different methods. My goal is to show that the second measure offers more rich / complete data than the first so I can justify that it is potentially a better measure. My problem stems from the very high correlation of the two measures potentially suggesting the traditional measure may be nearly the same as the new measure.

    Can someone give me arguments justifying why correlations are somewhat inapropriate for comparing these two measures and suggest what other descriptive statistics will help me demostrate that the second measure offers significantly more data than the first?

    I don’t understand how the two measures in the two ordinary histograms (and last 3D histogram which combines the other two histograms) can be correlated so highly (see attachments for pictures). The correlation between the two measures is .92 and thus the first could be said to explain 84% of the variance in the second. It seems that the second offers a much richer set of values in its distribution (In my sample the first measure calculates 135,000 zeros which the new method calculates some continuous set of values from 0 to about .6).

    Obviously the utility of the measures are actually in their correlation with other variables of interest but it seems that such a high correlation between these two measures would translate into similar correlations of each with other variables (thus defeating my argument).

    Thank you very much in advance for any help someone can give.

    -Rob
    Attached Images  
    Last edited by american_rob; 07-09-2008 at 09:01 AM.

  2. #2
    Dark Knight
    Points: 6,762, Level: 54
    Level completed: 6%, Points required for next Level: 188
    vinux's Avatar
    Posts
    2,011
    Thanks
    52
    Thanked 241 Times in 205 Posts
    Quote Originally Posted by american_rob View Post
    My basic question: Why do I get such a high correlation between two measures with such different distributions?
    -Rob
    It can happens. For example it can happen if one variable is dervied(transformation) from other variable.
    Could you provide the scatter plot of those variables.
    In the long run, we're all dead.

  3. #3
    Points: 3,021, Level: 33
    Level completed: 81%, Points required for next Level: 29

    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Scatter Plots

    The Scatter Plot is attached. The top scatter plot is from the same data I have described and the past graphics were from (I believe this is effectively a top down view of the 3-D histogram).

    I included the second scatter plot to maybe better demostrate why this high correlation is a problem for me. This is the same thing as in my other example but with a different sample. Obviously in this example the traditional measure calcuates only 0's or 1's whereas the new measure calculates a continuum of scores. However, these two measures are correlated .89. I don't really understand how just knowing the 0 or 1 can explain such a large percent (almost 80%) of the variance in the continuus measure.

    Remember my purpose is to show the new measure offers signitficantly more than the traditional measure. How can I argue that a correlation of old measure with new measure does not capture the significance of the new measure. Is there something else I should report instead of a correlation.

    Thanks again for any help.
    Attached Images  

  4. #4
    Dark Knight
    Points: 6,762, Level: 54
    Level completed: 6%, Points required for next Level: 188
    vinux's Avatar
    Posts
    2,011
    Thanks
    52
    Thanked 241 Times in 205 Posts
    The correlation is because the two measure are related(obvious answer). It seems the traditional measure(0 &1) is a derived variable of new measure.
    like
    traditional measure = if new measure >0.5 then 1 else 0;

    And
    Quote Originally Posted by american_rob View Post

    Remember my purpose is to show the new measure offers signitficantly more than the traditional measure. How can I argue that a correlation of old measure with new measure does not capture the significance of the new measure. Is there something else I should report instead of a correlation.
    You can confidently say that the new measure offers significantly more than the traditional measure. Because it give more information like
    you can compare most of all the obs/subjects ( In traditional measure there is no comparison within 0 or 1).
    And it is very important.
    Suppose if you wanted to pick top high measure 10 % obs( I am assuming percentage of 1's are more than 20%). Using traditional measure it is not possible. Only the new measure help in this situation

    Regards
    Vinu CT aka Vinux
    In the long run, we're all dead.

  5. #5
    Points: 3,021, Level: 33
    Level completed: 81%, Points required for next Level: 29

    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Old vs. New


    Vinux - thanks for the reply. Someone else I spoke to in person recomended something similar and suggested if I dropped the 0's the correlation might be much lower.

    You are correct the two measures are related. Actually, the new measure is calculated by taking into account a complete taxonomy at all levels simultaneously with a variant of IDF weighting based on information theory whereas the old measure effectively considered only a single level of the same taxonomy in calculating the cosine similarity of two entities.

    Becasue of this fact I would expect (and even desire) some correlation but a .92 correlation seems to suggest the new measure may not be especially worthwhile. Would standard deviations or some other descriptive statistics help support my argument that the new measure offers something significant beyond the original? How can I argue that correlations are innapropriate for consiering the value of one over the other if a reviewer asks for correlations?

    Thanks again for any help.

    -Rob

+ Reply to Thread

           




Similar Threads

  1. High Variance
    By alexburke in forum Statistics
    Replies: 0
    Last Post: 01-30-2011, 07:34 AM
  2. Replies: 0
    Last Post: 04-10-2009, 03:52 PM
  3. Help! my cronbach value is too high
    By rhilo in forum SPSS
    Replies: 1
    Last Post: 01-15-2009, 03:01 PM
  4. can a t value be as high as 7.52 w/ df of 267
    By misst83 in forum Statistics
    Replies: 6
    Last Post: 11-09-2008, 06:35 PM
  5. High Correlations with Very Different Distributions..?
    By american_rob in forum Statistics
    Replies: 0
    Last Post: 07-08-2008, 09:20 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats