I hope the following link would answer your question.
http://books.google.se/books?id=Je_p...page&q&f=false
Hello,
I have data regarding some new blood test. I have a large sample of 450 patients, all healthy, and I would like to characterize the distribution of the test's values. In addition I would like to set some "norm values", so if someone has less or more of these values, he will be marked.
So I have calculated every possible descriptive measure, but how do I determine "normal values", and is there a good way of estimation a distribution, or matching one (density I mean) ?
thanks !
I hope the following link would answer your question.
http://books.google.se/books?id=Je_p...page&q&f=false
What do you want the marks to mean? Will the number of marks be used for something? Will the marks be used in a regression, as a derived variable?
Mean Joe, what I meant with marks, is that is a new subject comes, and it's values are extreme in compare to the very large sample, than I might conclude that it's from a different population.
I heard of kernel density estimation, what I fail to understand is how can I determine if my density is normal, gamma, ... or other.
Basically what I think would be best, is if I could define limits that any new sample which will be outside these limits, will be considered to be from a different population, I just don't know how to define these limits, will the 5th and 95th percentiles be sufficient ?
well, there are quite a few ways to go about that. looking at the histogram is probably the simplest way but you can also plot your data VS the theoretical quantile plots of a few distributions and see how well they fit on the line. or see how well you can minimize the mean squared error for a proposed density estimation at each point of your data. the fitdistr() function in the MASS package is very useful (albeit somewhat limited) for this cases, but i guess one of the standard approaches would be to, once again, get a few candidate distributions, estimate their parameters thorugh maximum likelihood and see which one minimizes the loglikelihood/deviance the most. i've been recently looking at the locfit package in R and it looks promising.
5% and 95%? what's your reasoning behind those numbers? it seems to me that if you're looking for datapoints that may potentially belong to different distributions you may want to try fitting a finite mixture model.
Dason on the Cauchy distribution:
"YOU BETTER LOOK OUT BECAUSE THIS IS SOMETHING THAT IS GOING TO GET YOU"
|
|