# How to optimally group data for analysis of percentiles and conditional tail expectation?

#### Curious Jorge

##### New Member
I´m working with a set of historical data with which I intend to progress to MC simulation. At this stage I´m wrestling with how to group the data to assess percentiles and conditional tail expectations. I present 2 examples to illustrate my problem - ungrouped and grouped data.

I have a complete data set for individual ungrouped elements, each element reaching a terminal state value ranging anywhere from 0 to 1. These are shown in attached "TerminalStatesByElements...". Mean terminal state is 0.3259, population STDEV 0.4294. With histogram it looks like a U-shaped beta distribution with alpha and beta parameters 0.062364 and 0.129008 respectively:

I then group, for the sake of example, the data into buckets of 100 elements (100 arbitrarily selected), grouped in ascending order of time each element arose. Grouped data is shown in attached "TerminalStateAvgGrouped...". This is the manner practitioners usually view this sort of data - in groups and by time the elements arose. Mean is very close to the above but population SDEV naturally decreases, to 0.1465. Here is histogram for this grouped data:

And below are percentiles/conditional tail expectations for both the ungrouped individual elements (on the left) and groupings of 100 (for grouped data, the group terminal state is the average of the terminal states for each of the elements that fall within that group):

(Conditional tail expectation for my purposes is the average terminal state if a given percentile is crossed - for example, for the grouped data, in 90% of samples the average won´t exceed 55.1%).

So my question: is there an accepted method for optimally grouping data, in computing percentiles and deriving conditional tail expectations? Or more generally, for MC simulation? Is it done subjectively, based on groupings implied by how practitioners usually view and assess this type of data? Or is there a more scientific method for grouping? Grouping obviously has a large impact on SDEV, the shape of probability distribution, etc.

A less pertinent question: any views on the type of distribution for the grouped data shown above? I was hoping for a clean normal or log-normal but I instead have a double-humped camel.

THANKS FOR ANY INSIGHTS!

#### Attachments

• 362.2 KB Views: 0
• 3 KB Views: 0