# calculating expected cell frequency multivariate normal distribution

#### studentofstudent

##### New Member
How would I go about calculating the expected cell frequency of a discrete distribution (say, multivariate normal)? The 1-D analog would be the height of a bar in a histogram for a given interval.

#### ted00

##### New Member
but multivariate normal is continuous

not sure if I know what you're asking, but in, e.g., a contingency table, $\mu_{ij}=n\pi_{ij}$, with each being, resp., expected count, total no. obs., prob of cell in row $i$ and column $j$

#### studentofstudent

##### New Member
but multivariate normal is continuous

not sure if I know what you're asking, but in, e.g., a contingency table, $\mu_{ij}=n\pi_{ij}$, with each being, resp., expected count, total no. obs., prob of cell in row $i$ and column $j$
I may be confusing terminology here. Let me elaborate further.

I start with a continuous multivariate distribution. I partition it into multiple discrete intervals based on say the quartiles of each variable. What if I wanted to know the frequency of observations within each "cell" in my partitioned space if the true distribution was assumed to be multivariate normal. The analog would be in a 1-D case you have a histogram that is divided into multiple intervals (say, n) and each interval contains a frequency count representing the number of observed cases that fall in that interval. You can draw a fitted curve over that histogram that closely tracks the height of each rectangular bar representing the frequency in that interval. The curve would be continuous, while the bar of finite width would be discrete. In the multi variable case where we have partitioned the entire distribution into 4^N cells (N being the total number of variables, {X1,...,XN}). The frequency (or height of the rectangular bar in the histogram) would approximate the average of the height at the beginning of the interval (quartile1) and the height at the end of the interval (quartile2). The reason I say 4^N is because i'm dividing each variable into 4 quartiles as my partition.

I hope that paints a clearer picture?

#### BGM

##### TS Contributor
You have partitioned the $\mathbb{R}^N$ space into $4^N$ orthotopes / boxes / hyperrecatangles

These orthotopes $A_k$ can be expressed as the Cartaesian Product

$A_k = \prod_{i=1}^N [a_{ij}, b_{ij}]$

where $k \in \{1, 2, \ldots, 4^N\}, i \in \{1, 2, \ldots, N\}, j \in \{1, 2, 3, 4\}$

As mentioned by ted00, we just need to calculate the probability that the random vector $\mathbf{X} = (X_1, X_2, \ldots, X_N)$ falls into a particular orthotope $A_k$, which could be calculated numerically via the integral

$\int_{A_k} f_{\mathbf{X}}(\mathbf{x})d\mathbf{x}$

where $f_{\mathbf{X}}$ is the joint pdf given by you.

Once these probabilities are given, the expected frequency are just

$M\pi_k$

where $M$ is the sample size. This is analog to the 1-D case as the frequencies of all orthotopes jointly follows a multinomial distribution.