The following question was posed to me :

A person wants to compare between 2 versions of algorithms. He knows he has 100 False Negatives and wants the algorithms to find them.
version 1 found 40 out of 100 False Negatives.
version 2 found 90 out of 100 False Negatives.

Is ver2 better?

The correct answer, as I was told , was "No, it is not - one must perform a statistical test."

I fail to understand the logic. It seems to me, that whatever the statistical test is, ver2 will perform better in it...

Perhaps someone can explain the reasoning?