How would I go about calculating the expected cell frequency of a discrete distribution (say, multivariate normal)? The 1-D analog would be the height of a bar in a histogram for a given interval.
The mathematical explanation of a statistical procedure is really just pseudo-code, which we can make operational by translating it into real computer code. --B. Klemens
I may be confusing terminology here. Let me elaborate further.
I start with a continuous multivariate distribution. I partition it into multiple discrete intervals based on say the quartiles of each variable. What if I wanted to know the frequency of observations within each "cell" in my partitioned space if the true distribution was assumed to be multivariate normal. The analog would be in a 1-D case you have a histogram that is divided into multiple intervals (say, n) and each interval contains a frequency count representing the number of observed cases that fall in that interval. You can draw a fitted curve over that histogram that closely tracks the height of each rectangular bar representing the frequency in that interval. The curve would be continuous, while the bar of finite width would be discrete. In the multi variable case where we have partitioned the entire distribution into 4^N cells (N being the total number of variables, {X1,...,XN}). The frequency (or height of the rectangular bar in the histogram) would approximate the average of the height at the beginning of the interval (quartile1) and the height at the end of the interval (quartile2). The reason I say 4^N is because i'm dividing each variable into 4 quartiles as my partition.
I hope that paints a clearer picture?
You have partitioned the space into orthotopes / boxes / hyperrecatangles
These orthotopes can be expressed as the Cartaesian Product
where
As mentioned by ted00, we just need to calculate the probability that the random vector falls into a particular orthotope , which could be calculated numerically via the integral
where is the joint pdf given by you.
Once these probabilities are given, the expected frequency are just
where is the sample size. This is analog to the 1-D case as the frequencies of all orthotopes jointly follows a multinomial distribution.
Tweet |