Before I start I wanna say sorry. This post might a bit vague, I'm not really sure where I'm going and it's been a while since I took a stat class.
I'm hoping someone can nudge me in the right direction here, kinda lost at the moment.
We have an algorithm for correcting people's height - it's not really height but it's easier to explain. We already have a guesstimate of a person's height, and our algorithm makes an additional prediction of the height. After running our algorithm we have two numbers, the new height prediction and the guesstimated height we had before we started - we keep this as a backup e.g. if the new prediction is too different from the guesstimate.
When comparing the prediction with the person's actual height, our algorithm does fairly well. Most of the time we are pretty spot on and we can scrap the guesstimate.
However, sometimes the prediction is just plain wrong. It would be nice to feel more comfortable with the predictions and validate them using statistics. The idea was to use probability with a threshold to decide whether to keep the prediction or not - this has to be better than just tossing the prediction if its too different from guesstimate - right?
If we do small corrections the new predictions are most likely correct. When doing bigger corrections the likelihood decreases, but not always by a lot.
... and that's where I'm at. How can I tackle this problem? How can I "describe" this with statistics? How do I start?
Edit: What I really would like to know is if my new prediction is correct or not, and with what probability.
I started off by making a frequency distribution for differences between the guesstimate and the actual height.
As you see, I'm wandering in the dark. Any suggestions or nudges are appreciated!
Merry christmas everybody!
I'm hoping someone can nudge me in the right direction here, kinda lost at the moment.
We have an algorithm for correcting people's height - it's not really height but it's easier to explain. We already have a guesstimate of a person's height, and our algorithm makes an additional prediction of the height. After running our algorithm we have two numbers, the new height prediction and the guesstimated height we had before we started - we keep this as a backup e.g. if the new prediction is too different from the guesstimate.
When comparing the prediction with the person's actual height, our algorithm does fairly well. Most of the time we are pretty spot on and we can scrap the guesstimate.
However, sometimes the prediction is just plain wrong. It would be nice to feel more comfortable with the predictions and validate them using statistics. The idea was to use probability with a threshold to decide whether to keep the prediction or not - this has to be better than just tossing the prediction if its too different from guesstimate - right?
If we do small corrections the new predictions are most likely correct. When doing bigger corrections the likelihood decreases, but not always by a lot.
... and that's where I'm at. How can I tackle this problem? How can I "describe" this with statistics? How do I start?
Edit: What I really would like to know is if my new prediction is correct or not, and with what probability.
I started off by making a frequency distribution for differences between the guesstimate and the actual height.
As you see, I'm wandering in the dark. Any suggestions or nudges are appreciated!
Merry christmas everybody!
Last edited: