1. ## How to rank long lists in Spearman's Rank Correlation?

I have 2 lists of 1000 items and ranked them based on criteria A or B. When I want to compare the Spearman correlation at N=20 , should I use the rank values when all items exist or when only the 20 items exist?

For example using the 1000 items, I would get a ranking such as
Criteria A, Criteria B
1, 3
2, 200,
3, 202
..
100, 333

When I want to calculate pearman's Rank Correlation, should I keep the 200 and 202 or re-rank the list so no value is higher than 20? Re-ranking the lists when N=20 , means that I dropped items that were higher in the list, so is that correct?

Lost me! No idea what you are writing about. Perhaps an example may help convey your question.

I dont understand how you compare list of 1000 and a subset of 20 regardless of which correlation you use.

Sorry for the confusion , I will try to explain more. For example lets's say we are ranking universities based on 2 criterias A and B. And we want to see if there is correlation between the 2 ranking. Assume we have 1000 university.

1- Rank using Criteria A ( let's call it list A)
University - Rank
University A - 1
University B - 2
University C - 3
...
University ZZ - 1000

2- Rank using Criteria B could be like ( let's call it list B)
University - Rank
University A - 1
University B - 2
University ZZ - 3
...
University C - 1000

3- When ranking both lists , C will get rank of 1000.
University - Rank A - Rank B
University A - 1 - 1
University B - 2 - 2
University C - 3 - 1000
...
University ZZ - 1000 - 3

4- Now the question, if I want to compare the top 20 universities, what value should University C get using criteria B ? would it get 1000 since there are better universities above it in the initial rank , or I just ignore that fact and rank the list of the 20 universities that existed in list A using criteria B ?

If I rank the 1000 universities and then select the top 20 universities from both sets, some universities may exist in list A but not list B and vise versa.

Hope this clarify my question and many thanks for your help

How are you determining the rank for C? Adding the rank of A + B, adding a weighted A + B, multipling times them? Either way you are no using Spearman Rho. You are just finding some way to judge the best universities using some criteria from two different dimensions (A and B).

If you want to use Spearman rho to compare A & B you would just run a correlation between them. Spearman rho would throw out (I believe) pairs where a value was missing. So if A or B did not assign a value to say the 997 school it would not be used in the spearman rho.

There is no rank C , only University C , the 3rd list is just to compare criteria A and B . University C got rank 3 in list A and rank 1000 in list B. So the question was when I want to compare the top 20 universities, should I give University C
1- rank 1000 - OR
2- NA , and then just remove it since it is not in the top 20 - OR
3- remove the universities above university C in the 1000 rank and assume that University C got rank 20

