# How to assign a Z Score for missing values

#### Jennifer Murphy

##### Member
This forum has been very helpful to me and my "peripheral" understanding of statistics. For that, I thank you all.

As a result of suggestions I received here, I have made good progress in my weighted rating project. The suggestion to use z scores was a game changer. I now have a way to transform almost any kind of rating to common scale. That allows me to combine them effectively.

I think I have just a couple of questions.

Here's a table with some sample ratings. There are 8 "products" (Col D) for which I have ratings in 4 categories: Price (E-G), Rtg (rating, H-J), MPG (miles/gallon, K-M), and dB (decibels, N-P). In each section, the z scores and Rankings are calculated. Above the table, is the Mean (4), Std Dev (5), Order (6, whether high or low values are best), and Weights (7) for each section. The Weights are not involved in this discussion.

The z scores are calculated by some Excel code I wrote.

Cols Q & R have the sums of the z scores and the associated rankings. Cols S & T have the means of the z scores and the associated rankings.

Question #1: Which is better? The sums (Q) or the means (S) of the z scores? They both result in the exact same ranking (R & T). The sum shows more spreadout scores. The mean shows a more compact set of scores.

The rest of the tables show what happens when there are missing values, which is very common. My other question is what is the best way to handle missing values.

In this cirst example, I deleted the Price value for Product C (E30). This does not change the ranking within that section and makes only minimal changes in the overall rankings, 2 move up 1 and 2 move down 1.

In this exmple, I delete 4 values in the Rtg section. This has moderate impact on the rankings within the section. In the overall rankings, one product moved up 3 levels, the rest only moved 1 level.

In this example, I delete 4 values in the MPG section. As above, this causes moderate changes in the section rankings. In the overall rankings, every product moved, but only by 1 or 2 levels.

In this example, I delete all but the top 2 values in the dB section. This causes the second best value to drop to last in that section. It only loses 2 positions in the overall rankings, but every other product moved and most by 2 or 3 levels.

I have one more example where I combine all of these deletions, but the forum software will only allow me five attachments, so I'll post it in a reply.

#### Jennifer Murphy

##### Member
Here is that last table where I delete all of the values in all of the examples in the original post. Now there is quite a bit of movement in the overall rankings.

My question is, what is the best way to handle missing values? Is it assigning them the average value, as I have done here or, on the assumption that most sources will tend to include values for the best products and that the ones missing tend to be the less good ones, assign them something less.

Here are some options I could think of:
1. Give them an average value as I have done here.
2. Give them the same value as the lowest value which I do have?
3. Give them a value that is slightly less than the lowest value which I do have? If so, how much lower?
4. Give them a value one std dev below the mean?
5. Something else?