What is your research about, what did you study
in your experiments?
With kind regards
K.
Hi!
I am currently in the last phase of my thesis project and need to perform “meta” analysis on my experiment data… as I am struggling with some of the stats involved, I could really use some help... hopefully someone here can get me on the right track.
So what is the problem like?
I have conducted 200+ or so experiments (E1, E2, … E200) as part of my research. For starters, please note that each experiment is ‘unique’ and not directly connected to any one of the other experiments.
In each experiment I examine multiple ranking methods with respect to a ‘ground truth’ ranking that was obtained for that specific experiment. So for example, for experiment 1, I have an established ideal ranking RI1 and several computed rankings RA1, RB1, …. Each computed ranking is derived using a set of algorithms that is stable across all the 200+ conducted experiments (RA, RB, …).
The computed rankings are compared to the ideal ranking using several metrics. Specifically, I compute 3 different type of rank correlation coefficientss per experiment, namely, Kendall Tau (basic 'A' variant as there are never any ranking ties), Spearman Rho and Spearman footrule. These metrics are computed for each computed ranking (so RA1 versus RI1, RB1 versus RI1, … etc).
These metrics allow me to state which ranking method performs best (on an individual experiment basis). For example, I might state that ranking algorithm RA performed best on experiment 1, for example based on the Kendal Tau score.
Please note that the set of observations may be different in each experiment. So for example, in experiment 1 there may be 8 observations (n=8), whereas in the second experiment 19 data points are ranked, etc. The lower limit is 8, and the upper limit is 30.
Now to the problem… I want to state something about the algorithms on the entire experiment set. Basically, I want to state that, based on all experiments, overall ranking algorithm R? worked best.
I am now using the average value to make this statement. However I am now unsure if this is actually valid. It seemed valid as all the experiment are unconnected to each other. However, perhaps I should use a Fisher R-to-Z transformation in this case. I am not sure anymore.
There is a second part to this problem too. As explained, one part of my research is using correlation coefficients for the evaluation, as my research is a lot about analyzing rank orders.. However, there are also some custom indicators that I’ve created to evaluate some other aspects of the ranking algorithms (for example, proximity of certain data points to other data points). These custom metrics are most certainly valid on a single experiment. but again, how do I aggregate / compute a valid result for the entire experiment set? My own custom indicators are not true rank correlations, as they express other aspects. Could an averaging approach work?
Finally, I also compute various classification metrics from the Information Retrieval domain (namely, precision and recall). Is averaging over the experiments an option for precision and recall?
Any help is much appreciated!
Falco
What is your research about, what did you study
in your experiments?
With kind regards
K.
I'm researching ranking of attributes in Wikipedia Infobox templates. But the essence is that each experiment has a list of attributes, and optimal ranking order for that list RI, and various computed ranking orders RA, RB, RC... Each infobox is about a different topic and can have a different amount of attributes
Tweet |