random question about non-Guassian multivariate distributions...

spunky

Smelly poop man with doo doo pants.
#1
today in my seminar we were talking about all these interesting ways people can start with random uniform distributions [0,1] and through doing and re-doing things to them end up with pretty much any well-known probability distribution...

.... i was wondering whether something along those lines is true of any distribution (or does it even keep its properties at all).

for the very specific example i have in mind.... say you have X1, X2, X3..Xn which all follow a multivariate normal distribution (say a standard one so we get our nice vector of 0's for a mean and a correlation matrix).

if i start creating new variables say Y1 = aX1 + bX1^2 +cX1^3, Y2=dX2 + eX2^2+ fX2^3 and so on... (so, in other words, i am creating new variables Yi which are the result of some polynomial expressions of my original multivariate standard normal Xi's) ...

... is there a way to know something about the distribution of <Y1, Y2, Y3,...,Yn>? or could it be that those Ys are not even a probability distribution anymore? any insight into how to get the density/cdf?
 

spunky

Smelly poop man with doo doo pants.
#3
for simplicity let's start assuming it's just a function of it's corresponding Xi alone and let's see where it goes... lol.
 

Dason

Ambassador to the humans
#4
And you said that we were assuming a standard multivariate normal. Are you implying we work with no correlation between the Xs? In that case it's just a matter of finding the distribution of a polynomial transformation?
 

ledzep

Point Mass at Zero
#5
Might be bit tedious to find the distribution of the transformed variables when dealing with polynomials. I assume this might result in non-standard distribution.
 

spunky

Smelly poop man with doo doo pants.
#6
And you said that we were assuming a standard multivariate normal. Are you implying we work with no correlation between the Xs?
Dang! i should've said they are supposed to be correlated... sorry about that, yeah, they should be correlated.




Might be bit tedious to find the distribution of the transformed variables when dealing with polynomials. I assume this might result in non-standard distribution.

here's the line of reasoning of why i'm doing what i'm doing. 99.9% of simulation papers for non-normal, multivariate data rely on a paper published in Psychometrika. the full citation is:

Vale, C.D. and Maurelli, V.A. (1983) Simulating multivariate nonormal distributions, Psychometrika 48, 465-471.

the process is relatively straightforward and that is why people like it. starts with a) generate some multivarite normal vectors with a pre-specified correlation matrix. (b) do polynomial transofmations like i described and through the appropriate choice of constants which are multiplied by the powers of the random normal deviates you can end up generating random numbers with pre-specified correlations between them as well as pre-specified ammounts of skewness and kurtoses.

a guy by the name of Tadikamalla criticized this process precisely by saying something along the lines of "well, this is all very interesting but what do we know about the probability distribution of the random variable after it has gone through all these multiplyings and powerings?

now that i'm taking my seminar on non-Gaussian multivariate distributions we started talking about the relevance of moving away from from distributions whose contour (if we were to plot it) is elliptical. the prof proceeded to show us that there are a lot of distributions which can take all sorts of weird skewnesses and kurtoses (like the Pearson Type II distribution which looks like a cylinder) and still keep its elliptical shape.

this, of course brough back memories from the Vale & Maurelli paper which prompted me to ask something along the lines of

(a) let's take it back just a second and ask ourselves... not only do we know nothing about the distribution of such new variable after it is transformed through all these powers and whatnot, but do we even know if it is a distribution at all or just a whole bunch of numbers that behave the way we want?

and b...

(b) because these numbers were originated from a multivarite normal distribution with a set correlation matrix and, in the end, such correlations are still preserved even though we tansformed the variables to get all these crazy skewness and kurtoses.... could it be that the distribution we end up with is, in fact and ellpitical distribution so that it keeps its elliptical contour?

because if it does...well... in the light of what i've just learnt about how inadequate ellpitically-countoured distributions are to model some types of multivariate data... it could well be that the past 100 years of psychometric work on correlated, non-normal distributions needs some HUGE revisions because popular techniques such a structural equation modeling live and die by the assumption of normality...



psychometrics... NEEDS REVISION!! (could someone please cue in "Kaleidoscope of Mathematics" from my favourite movie. A Beautiful Mind?
 

Dragan

Super Moderator
#7
Dang! i should've said they are supposed to be correlated... sorry about that, yeah, they should be correlated.







here's the line of reasoning of why i'm doing what i'm doing. 99.9% of simulation papers for non-normal, multivariate data rely on a paper published in Psychometrika. the full citation is:

Vale, C.D. and Maurelli, V.A. (1983) Simulating multivariate nonormal distributions, Psychometrika 48, 465-471.

the process is relatively straightforward and that is why people like it. starts with a) generate some multivarite normal vectors with a pre-specified correlation matrix. (b) do polynomial transofmations like i described and through the appropriate choice of constants which are multiplied by the powers of the random normal deviates you can end up generating random numbers with pre-specified correlations between them as well as pre-specified ammounts of skewness and kurtoses.

a guy by the name of Tadikamalla criticized this process precisely by saying something along the lines of "well, this is all very interesting but what do we know about the probability distribution of the random variable after it has gone through all these multiplyings and powerings?
Spunky, I solved this problem that Tadikamalla raised in his 1980 article. In short, the pdf and cdf for power method polynomials has to be expressed in parametric form (real-two space). The first article I wrote on this was a JSCS article published in 2007.

You can also see the derivation of the pdf and cdf in my book on pages 9-14.
 

spunky

Smelly poop man with doo doo pants.
#8
In short, the pdf and cdf for power method polynomials has to be expressed in parametric form (real-two space)
thanks Dragan! but i was wondering a little bit more about the real N-space case.... i am learning in my seminar that properties we usually like which happen in 2d spaces may not extend into Nd-spaces... (like those darn Frechet lower bounds!)

any insights?
 

Dragan

Super Moderator
#9
I think it depends on what it is a researcher is doing...Are you referring to higher order moments? or other things?
 

spunky

Smelly poop man with doo doo pants.
#10
Are you referring to higher order moments? or other things?
i guess what i am asking more precisely would be:

for the transfromed variable (as discussed above with all the multiplying and powering), in the d > 2 space...

(a) does it have a legit density function (so that it's non-negative and integrates to 1)?

(b) does it belong to the family of elliptically-contoured distributions? (which i guess could be answered from a if there is a closed-form expression)
 

Dragan

Super Moderator
#11
i guess what i am asking more precisely would be:

for the transfromed variable (as discussed above with all the multiplying and powering), in the d > 2 space...

(a) does it have a legit density function (so that it's non-negative and integrates to 1)?

(b) does it belong to the family of elliptically-contoured distributions? (which i guess could be answered from a if there is a closed-form expression)

I would say yes, so long as the transformations are strictly increasing. I good place to start on this topic would be to look at Chapter 9 in Karian & Dudewicz (2011) in the context of the Generalized Lambda Distribution (see pages 363-414) because the idea is similar to what you're asking. They develop a nice extension of a bivariate GLD using Plackett's Method of a bivariate cdf construction that has to consider the Fretchet upper and lower bounds. I'm relatively certain that this approach could be used in the context of power method polynomials....Sounds like a good toipic for a dissertation. :)

References:

Karian, Z. A., and Dudewicz, E. J. (2011) Handbook of Fitting Statistical Distributions with R. Chapman & Hall/CRC, Boca Raton, FL.

Plackett, R. L. (1965). A class of bivariate distributions. Biometrica, 60, 516-562.
 

spunky

Smelly poop man with doo doo pants.
#12
I would say yes so long as the transformations are strictly increasing.
so.... is that a yes to (a) and (b) implying that it has a legit pdf and resulting power-transformed RVs belong to the family of elliptical distributions... or just a yes to the (a)... or to the (b) parts?

I good place to start on this topic would be to look at Chapter 9 in Karian & Dudewicz (2011) in the context of the Generalized Lambda Distribution (see pages 363-414) because the idea is similar to what you're asking. They develop a nice extension of a bivariate GLD using Plackett's Method of a bivariate cdf construction that has to consider the Fretchet upper and lower bounds. I'm relatively certain that this approach could be used in the context of power method polynomials....
oh thank you! THIS is what i'm after so i dont have to spend hours and hours on the internet trying to see whether this is even feasible or not...

Sounds like a good toipic for a dissertation. :)
i'll keep it in the bucket list then for the PhD... gotta get that MA thesis done first but it's always nice to start looking at topics early on...

Thank you very much Dragan, ledzep and Dason. i think i shall proceed to spread luv & thanks to everyone... :)
 

BGM

TS Contributor
#13
Actually I wonder why Spunky have doubts on the transformation part.

As long as the original random variable is absolutely continuous, and the function has countable many critical points only, (this is the condition I guess)
then the resulting transformed variable should be again absolutely continuous and with a legit pdf.

Of course in multivariate case it is provided that the random variables does not have any functional relationship: e.g. if X = -Y, then Z = X + Y is identical to zero.

I am not sure about the condition to preserve an elliptic family after the transformation.
 

Dason

Ambassador to the humans
#14
Actually I wonder why Spunky have doubts on the transformation part.

As long as the original random variable is absolutely continuous, and the function has countable many critical points only, (this is the condition I guess)
then the resulting transformed variable should be again absolutely continuous and with a legit pdf.

Of course in multivariate case it is provided that the random variables does not have any functional relationship: e.g. if X = -Y, then Z = X + Y is identical to zero.
I was wondering about this as well. It seemed like that part should be simple but I wasn't too sure of myself.
 

spunky

Smelly poop man with doo doo pants.
#15
As long as the original random variable is absolutely continuous, and the function has countable many critical points only, (this is the condition I guess) then the resulting transformed variable should be again absolutely continuous and with a legit pdf.

Of course in multivariate case it is provided that the random variables does not have any functional relationship: e.g. if X = -Y, then Z = X + Y is identical to zero.
thanks BGM! i guess because i haven't seen the actual pdf for the resulting power-transformed RV (in the case of greater than 2 dimensions) i had trouble seeing it... but you're right, i didnt think about it from this perspective. thanks :)

I am not sure about the condition to preserve an elliptic family after the transformation.
this is the part that i'd REALLY want to know... :)
 

spunky

Smelly poop man with doo doo pants.
#17
So I think only affine transformation can preserve this relationship.
so if i'm reading your message correctly your intuition would be that the resulting power-transformed RVs are not really part of the elliptical family of distributions... right?
 

Dason

Ambassador to the humans
#18
Well a standard normal squared gives a chi-square which isn't in the elliptic family right? So I would say that you couldn't say that the power transformation is part of the elliptic family in general.
 

spunky

Smelly poop man with doo doo pants.
#19
Well a standard normal squared gives a chi-square which isn't in the elliptic family right? So I would say that you couldn't say that the power transformation is part of the elliptic family in general.
ta'da! and i got my answer.... AT LAST i mean, it relies on that claim about affine transfromation but upon thinking about it... it does seem like something reasonable....

... and all before the end of class!!! you people are awesome!
 

Dragan

Super Moderator
#20
I would say that what is required is that any polynomial be of odd order (e.g. 3 or 5) where the transformation is strictly increasing - this implies that we would have a valid power method density. This also implies that the derivative is a polynomial of even order where the solutions to the zeros are all complex. For example, if we have polynomials of order 3 (i.e. Fleishman type), then the derivatives of the polynomials are quadratic and the associated discriminants must be negative --which has to be the case as the parametric form (real-two space) of a pdf for any marginal distribution is:

\( f_{p\left ( z \right )}=\left ( p\left ( z \right ),\frac{\phi \left ( z \right )}{{p}'\left ( z \right )} \right ) \)

where you can see that the derivative of the polynomial is in the denominator of the ordinate and must be everywhere positive. Note that the numerator of the ordinate is the standard normal pdf.