nonprofit seeks advice on percentile methods

jhall251

New Member
Hoping I can get some opinions on the method we are using to calculate percentiles.

I work with a small non-profit - I need to calculate percentiles for a number of fee ranges, i.e. what is the median fee charged, 45th ptile, 55th ptile etc. in different regions of our state.

I've been looking at http://www.wessa.net/quart.wasp which calculates percentiles or quartiles by a number of different methods. I know there is not consensus on a "best" method, but is there a most common method or methods? Is there one that is particularly appropriate for small data sets? Is "Closest Observation" an acceptable method? Is Weighted Average at X(n+1)p preferred over Weighted Average at Xnp or are they equally okay?

Sorry for naive questions, thanks for any help!!!

JohnM

TS Contributor
Your questions are anything but naive - you are very correct that there are many different methods, and no consensus.

Maybe you could provide us with a sample data set - it might help us point you in the right direction. Some methods do straight interpolations, some interpolate but also assume that the underlying population is normal, etc. and everything in between.

jhall251

New Member
Thanks for your response! I am including two sample data sets below. Some of the data sets I'm working with are large, i.e. hundreds of values, some are as small as 2 or 3 values. The smaller ones of course are where the choice of method makes more of a difference in results.

I have noticed that people tend to calculate Rank using p*(n+1) though apparently p*n is also okay. Is there a reason to prefer p*(n+1)?

It seems that in smaller data sets p*(n+1) can return values that are higher than the real closest observation? For example, in the first data set below there are 22 values, the .75th rank is 16.5 so a weighted average will be between R 16 and R 17 -- but the p*(n+1) rank is 17.25 so the weighted average doing it that way will be between R 17 and R 18 - which, depending on how close the values are at R 16, R 17, and R 18, could be a higher number, above the real number in the range that lies at or near the .75 place.

Practically, I most need to know if there is any reason not to use p*n rather than p*(n+1). Thanks for any advice!

Data set 1
150
180
180
190
205
215
220
225
228
230
230
235
235
245
245
249
250
250
250
250
270
275

Data Set 2
125
125
135
145

Last edited: