nonprofit seeks advice on percentile methods

#1
Hoping I can get some opinions on the method we are using to calculate percentiles.

I work with a small non-profit - I need to calculate percentiles for a number of fee ranges, i.e. what is the median fee charged, 45th ptile, 55th ptile etc. in different regions of our state.

I've been looking at http://www.wessa.net/quart.wasp which calculates percentiles or quartiles by a number of different methods. I know there is not consensus on a "best" method, but is there a most common method or methods? Is there one that is particularly appropriate for small data sets? Is "Closest Observation" an acceptable method? Is Weighted Average at X(n+1)p preferred over Weighted Average at Xnp or are they equally okay?

Sorry for naive questions, thanks for any help!!!
 

JohnM

TS Contributor
#2
Your questions are anything but naive - you are very correct that there are many different methods, and no consensus.

Maybe you could provide us with a sample data set - it might help us point you in the right direction. Some methods do straight interpolations, some interpolate but also assume that the underlying population is normal, etc. and everything in between.
 
#3
Thanks for your response! I am including two sample data sets below. Some of the data sets I'm working with are large, i.e. hundreds of values, some are as small as 2 or 3 values. The smaller ones of course are where the choice of method makes more of a difference in results.

I have noticed that people tend to calculate Rank using p*(n+1) though apparently p*n is also okay. Is there a reason to prefer p*(n+1)?

It seems that in smaller data sets p*(n+1) can return values that are higher than the real closest observation? For example, in the first data set below there are 22 values, the .75th rank is 16.5 so a weighted average will be between R 16 and R 17 -- but the p*(n+1) rank is 17.25 so the weighted average doing it that way will be between R 17 and R 18 - which, depending on how close the values are at R 16, R 17, and R 18, could be a higher number, above the real number in the range that lies at or near the .75 place.

Practically, I most need to know if there is any reason not to use p*n rather than p*(n+1). Thanks for any advice!

Data set 1
150
180
180
190
205
215
220
225
228
230
230
235
235
245
245
249
250
250
250
250
270
275

Data Set 2
125
125
135
145
 
Last edited: