1. ## Characterizing a population

Hello,

I have data regarding some new blood test. I have a large sample of 450 patients, all healthy, and I would like to characterize the distribution of the test's values. In addition I would like to set some "norm values", so if someone has less or more of these values, he will be marked.

So I have calculated every possible descriptive measure, but how do I determine "normal values", and is there a good way of estimation a distribution, or matching one (density I mean) ?

thanks !

3. ## Re: Characterizing a population

What do you want the marks to mean? Will the number of marks be used for something? Will the marks be used in a regression, as a derived variable?

4. ## Re: Characterizing a population

Originally Posted by NN_STAT
and is there a good way of estimation a distribution, or matching one (density I mean) ?
what about kernel density estimation? would that help?

5. ## Re: Characterizing a population

Mean Joe, what I meant with marks, is that is a new subject comes, and it's values are extreme in compare to the very large sample, than I might conclude that it's from a different population.

I heard of kernel density estimation, what I fail to understand is how can I determine if my density is normal, gamma, ... or other.

Basically what I think would be best, is if I could define limits that any new sample which will be outside these limits, will be considered to be from a different population, I just don't know how to define these limits, will the 5th and 95th percentiles be sufficient ?

6. ## Re: Characterizing a population

Originally Posted by NN_STAT
what I fail to understand is how can I determine if my density is normal, gamma, ... or other.
well, there are quite a few ways to go about that. looking at the histogram is probably the simplest way but you can also plot your data VS the theoretical quantile plots of a few distributions and see how well they fit on the line. or see how well you can minimize the mean squared error for a proposed density estimation at each point of your data. the fitdistr() function in the MASS package is very useful (albeit somewhat limited) for this cases, but i guess one of the standard approaches would be to, once again, get a few candidate distributions, estimate their parameters thorugh maximum likelihood and see which one minimizes the loglikelihood/deviance the most. i've been recently looking at the locfit package in R and it looks promising.

Originally Posted by NN_STAT
Basically what I think would be best, is if I could define limits that any new sample which will be outside these limits, will be considered to be from a different population, I just don't know how to define these limits, will the 5th and 95th percentiles be sufficient ?
5% and 95%? what's your reasoning behind those numbers? it seems to me that if you're looking for datapoints that may potentially belong to different distributions you may want to try fitting a finite mixture model.

7. ## Re: Characterizing a population

Originally Posted by NN_STAT
Basically what I think would be best, is if I could define limits that any new sample which will be outside these limits, will be considered to be from a different population, I just don't know how to define these limits, will the 5th and 95th percentiles be sufficient ?
The book; Clinical Biochemistry: Metabolic and Clinical Aspects (find it in books.google.com) will answer your questions. just search for reference range inside the book.

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts