# How to generate multiple correlated variables?

#### consuli

##### Member
Hello!

One can generate two c0rrelated variables as follows:

x1= z1
x2= c* z1 + sqrt(1- c^2)* z2
Where
z1: standardnormal random variable
z2: standardnormal random variable
c= Correlation(x1, x2)

How can I generate more than two correlated variables?

#### spunky

##### Smelly poop man with doo doo pants.
well, the general way is to take the correlation (or covariance) matrix that you'd like to be the population parameter, do some type of decomposition on it (eigen/spectral decomposition, cholesky, etc.) and multiply it times the vectors you'd like to have correlated.

lemme give you an example using the cholesky decomposition. say the correlation matrix i want in the population is for 4 variables and looks like:

Code:
> R <- matrix(rep(0.5,16),4,4)
> diag(R) <- 1
> R
[,1] [,2] [,3] [,4]
[1,]  1.0  0.5  0.5  0.5
[2,]  0.5  1.0  0.5  0.5
[3,]  0.5  0.5  1.0  0.5
[4,]  0.5  0.5  0.5  1.0
so a 4X4 correlation matrix with everything correlated at 0.5 (in the population).

first you generate 4 normal variables (standardized just to make things simpler) and assemble them in a matrix. i used a sample size of N=1000 for no particular reason

Code:
X <- cbind(rnorm(1000),rnorm(1000),rnorm(1000),rnorm(1000))
colnames(X) <- c("X1", "X2", "X3", "X4")
then the only thing you have to do is a cholesky decomposition of the correlation matrix R and post-multiply that times the data matrix X:

Code:
cholR <- chol(R)

data1 <- X%*%cholR
if you calculate the correlation matrix on the new dataset "data1" you will see that it is (within sampling variability) close to the population correlation R

Code:
> cor(data1)
[,1]      [,2]      [,3]      [,4]
[1,] 1.0000000 0.4939773 0.5189865 0.5142373
[2,] 0.4939773 1.0000000 0.4923293 0.4996839
[3,] 0.5189865 0.4923293 1.0000000 0.5221356
[4,] 0.5142373 0.4996839 0.5221356 1.0000000
>
now, if you want everything in just 1 step, just use the 'mvrnorm' function in the MASS package that generates multivariate random normal variates with a pre-specified population vector of means and variance-covariance matrix of your choice:

Code:
> library(MASS)
> mu <- c(0,0,0,0) # vector of means
> data2 <- mvrnorm(1000, mu, R)
> cor(data2)
[,1]      [,2]      [,3]      [,4]
[1,] 1.0000000 0.5149381 0.5311652 0.4995711
[2,] 0.5149381 1.0000000 0.5297210 0.5313051
[3,] 0.5311652 0.5297210 1.0000000 0.5254646
[4,] 0.4995711 0.5313051 0.5254646 1.0000000
>

#### consuli

##### Member
Thank you so much for your great answer! It answers either the mathematical part of the question and either the stat programming part of the question!