# Thread: simulating correlated, non-normal data: WHY DOES THIS HAPPEN!?

1. ## Re: simulating correlated, non-normal data: WHY DOES THIS HAPPEN!?

Originally Posted by Dragan
If you want to create a Cauchy distribution for input, then it's a straight-forward thing to do.

Specifically, if we let Z1 and Z2 be independent standard normal deviates, then the ratio X=Z1/Z2 will follow a Cauchy distribution.

Maybe I'm missing something here (Greta?).
rcauchy would probably work too

Originally Posted by GretaGarbo
So THAT is what it means!!

Is that abbreviation in the general language also as well known as "USA" or "NATO"?
In general language? Probably not. But it's pretty common in stats courses.

2. ## Re: simulating correlated, non-normal data: WHY DOES THIS HAPPEN!?

Originally Posted by Dason
In general language? Probably not. But it's pretty common in stats courses.
Yes, in stat courses in the English language. Does every participant here on TalkStats comes from a stats course? Or from a class with the English language?

3. ## Re: simulating correlated, non-normal data: WHY DOES THIS HAPPEN!?

Originally Posted by GretaGarbo
Yes, in stat courses in the English language. Does every participant here on TalkStats comes from a stats course? Or from a class with the English language?
I'm not sure what you're getting at. But I felt pretty safe using CLT in a discussion with spunky.

4. ## Re: simulating correlated, non-normal data: WHY DOES THIS HAPPEN!?

Originally Posted by GretaGarbo
Isn't there anybody who is going to suggest Spunky to simulate Cauchy distributed variables as input?
Originally Posted by Dason
rcauchy would probably work too
Code:
``````library(copula)
library(psych)

cop1 <- mvdc(normalCopula(c(0.5, 0.5, 0.5), dim=3, dispstr="un"),
c("cauchy", "cauchy", "cauchy"),list(list(location=0, scale=1),
list(location=0, scale=1), list(location=0, scale=1)))

Q <- rMvdc(1000, cop1)``````
THERE! THERE YOU HAVE IT! now i have summoned its DARK MAGIC to further taint my thread. just look at these descriptive stats!

Code:
``````> describe(Q)
var    n  mean    sd median trimmed  mad     min    max   range   skew kurtosis   se
1   1 1000  1.91 37.16   0.03    0.04 1.44 -167.96 681.39  849.35  14.79    249.4 1.18
2   2 1000 -0.09 21.61  -0.01    0.07 1.44 -480.00 226.15  706.15 -10.85    270.4 0.68
3   3 1000  0.46 34.85   0.02    0.05 1.43 -400.54 805.44 1205.98  11.46    309.2 1.10``````
look at that! kurtoses over 200! a range of more than a 1000!!

and if we plot variable #3:

LOOK AT IT! IT'S FILTHY! FILTHY!!!! FILTHY!!!

no God or Goddess in this Universe could allow such... such... MONSTROSITY to exist...

5. ## Re: simulating correlated, non-normal data: WHY DOES THIS HAPPEN!?

Originally Posted by Dason
I'm not sure what you're getting at. But I felt pretty safe using CLT in a discussion with spunky.
You know very well that what I am getting at is that many readers here at TalskStats will not understand what you mean by "CLT" because they have not attended a stats course (where theorem are turned to abbreviations) or been at an English speaking university. Is it clear for you if I talk about CGS or ZGS? (But maybe it it obvious for Karabiner, the last one, that is)

Or do you expect Spunky to be the only reader?

6. ## Re: simulating correlated, non-normal data: WHY DOES THIS HAPPEN!?

Originally Posted by GretaGarbo
Is there any distance measures that can summarize how far away the generated distribution is from an believed distribution? To my mind came Kullback-Leiber distance or good old Pearson chi square? Or is it silly to imagine a one-number-quality-index? It would be convenient with such a measure.

Will the central limit theorem ensure multivariate normality in this case?
these are very good questions Greta, and i would need to think about them more. the thing is (here in social-science land) most people don't think in terms of distributions but in terms of moments of the distributions when talking about non-normality (i.e. non zero skewness and non-zero/non-3 kurtosis, depending on which estimate you prefer). so, for example, many people say something like "we generated data with one-dimensional marginals having a skewness of 2 and a kurtosis of 7 in the population" to explore the influence of non-normality. they don't say "we generated data from a uniform and a gamma distribution, each correlated at X number". so i guess that would make it tricky to use some distance measure since the intended non-normal distribution just has to be... well... "non-normal"... whatever that means.

what i'm working on my "program of research" (for lack of a better term) and in light of the replicability crisis in Psychology (which is permeating other social sciences) is point out the fact that many of the simulation studies that inform practice are suboptimal. and i don't think many people look at them and study their simulation designs carefully because not many people have either the patience or the expertise to go down to the nuts and bolts of what exactly people are simulating and how exactly they are doing it. so it is going to take a while but, at the very least, i know i'm gonna get enough published articles to transform into my dissertation and graduate

7. ## Re: simulating correlated, non-normal data: WHY DOES THIS HAPPEN!?

Originally Posted by GretaGarbo
You know very well that what I am getting at is that many readers here at TalskStats will not understand what you mean by "CLT" because they have not attended a stats course (where theorem are turned to abbreviations) or been at an English speaking university. Is it clear for you if I talk about CGS or ZGS? (But maybe it it obvious for Karabiner, the last one, that is)

Or do you expect Spunky to be the only reader?
I'm just saying that in this particular thread I felt completely comfortable using "CLT". I was responding to spunky and seriously CLT is one of the most common abbreviations in theoretical statistics. In my opinion most people that make it past the first post in this thread will know what CLT stands for. There will of course be exceptions but oh well.