I have a data set of a number of different variables that relate to the performances of footballers in matches. Examples include Accurate Passes/90 mins, Crosses/90 mins and Headers/90mins.
Rather than taking the actual value in each column, I have replaced it with a rank. So, given there are 100 footballers in the data set, the player with the highest value in the Accurate Passes/90 mins columns is assigned the number 1, the second highest value is assigned 2, third highest player 3, all the way to 100.
For each position, I have taken a subset of the variables. For example, for defenders, I'm only interested in how many tackles and clearances they've made and not how many shots they've had.
My question is to do with how to weight these variables, as some are more important than others when evaluating a player. For example, for strikers, although I am interested in how many passes they make, the number of goals they've scored is much more important and should be weighted accordingly.
For each position, I have a player that I know is the best in that position and I would like to assign appropriate weightings (w) to the relevant variables (V) to help achieve this. Each player will be assigned a number (let's call it x) which is the sum of the weighting times the variable rank. My aim is for x to be the lowest number for the best player in each position. So:
x = w1V1 + w2V2 + w3V3 + w4V4 + ....
where
0 << wi << 1
i = 1,2,3,4,...
It must also be noted that the weighting of each variable can change depending on which position is being examined. For example, when looking at strikers, the goals weighting will be much higher than when looking at midfielders. This is because, although goals can be used to evaluate how good both midfielders and strikers are, it is much more important for a striker to score goals than a midfielder. Therefore the goals variable will carry more weight when examining strikers compared to midfielders.
How would be the best way of working out the weightings? Help much appreciated!
Tweet |