How to generate multiple correlated variables?

#1
Hello!

One can generate two c0rrelated variables as follows:

x1= z1
x2= c* z1 + sqrt(1- c^2)* z2
Where
z1: standardnormal random variable
z2: standardnormal random variable
c= Correlation(x1, x2)

How can I generate more than two correlated variables?
 

spunky

Smelly poop man with doo doo pants.
#2
well, the general way is to take the correlation (or covariance) matrix that you'd like to be the population parameter, do some type of decomposition on it (eigen/spectral decomposition, cholesky, etc.) and multiply it times the vectors you'd like to have correlated.

lemme give you an example using the cholesky decomposition. say the correlation matrix i want in the population is for 4 variables and looks like:

Code:
> R <- matrix(rep(0.5,16),4,4)
> diag(R) <- 1
> R
     [,1] [,2] [,3] [,4]
[1,]  1.0  0.5  0.5  0.5
[2,]  0.5  1.0  0.5  0.5
[3,]  0.5  0.5  1.0  0.5
[4,]  0.5  0.5  0.5  1.0
so a 4X4 correlation matrix with everything correlated at 0.5 (in the population).

first you generate 4 normal variables (standardized just to make things simpler) and assemble them in a matrix. i used a sample size of N=1000 for no particular reason

Code:
X <- cbind(rnorm(1000),rnorm(1000),rnorm(1000),rnorm(1000))
colnames(X) <- c("X1", "X2", "X3", "X4")
then the only thing you have to do is a cholesky decomposition of the correlation matrix R and post-multiply that times the data matrix X:

Code:
cholR <- chol(R)

data1 <- X%*%cholR
if you calculate the correlation matrix on the new dataset "data1" you will see that it is (within sampling variability) close to the population correlation R

Code:
> cor(data1)
          [,1]      [,2]      [,3]      [,4]
[1,] 1.0000000 0.4939773 0.5189865 0.5142373
[2,] 0.4939773 1.0000000 0.4923293 0.4996839
[3,] 0.5189865 0.4923293 1.0000000 0.5221356
[4,] 0.5142373 0.4996839 0.5221356 1.0000000
>
now, if you want everything in just 1 step, just use the 'mvrnorm' function in the MASS package that generates multivariate random normal variates with a pre-specified population vector of means and variance-covariance matrix of your choice:

Code:
> library(MASS)
> mu <- c(0,0,0,0) # vector of means
> data2 <- mvrnorm(1000, mu, R)
> cor(data2)
          [,1]      [,2]      [,3]      [,4]
[1,] 1.0000000 0.5149381 0.5311652 0.4995711
[2,] 0.5149381 1.0000000 0.5297210 0.5313051
[3,] 0.5311652 0.5297210 1.0000000 0.5254646
[4,] 0.4995711 0.5313051 0.5254646 1.0000000
>
 
#3
Thank you so much for your great answer! It answers either the mathematical part of the question and either the stat programming part of the question!