+ Reply to Thread
Results 1 to 3 of 3

Thread: How to generate multiple correlated variables?

  1. #1
    Points: 7,920, Level: 59
    Level completed: 85%, Points required for next Level: 30

    Posts
    160
    Thanks
    1
    Thanked 7 Times in 7 Posts

    How to generate multiple correlated variables?




    Hello!

    One can generate two c0rrelated variables as follows:

    x1= z1
    x2= c* z1 + sqrt(1- c^2)* z2
    Where
    z1: standardnormal random variable
    z2: standardnormal random variable
    c= Correlation(x1, x2)

    How can I generate more than two correlated variables?
    Prediction is very difficult, especially about the future. (Niels Bohr)

  2. #2
    TS Contributor
    Points: 22,432, Level: 93
    Level completed: 9%, Points required for next Level: 918
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,135
    Thanks
    166
    Thanked 537 Times in 431 Posts

    Re: How to generate multiple correlated variables?

    well, the general way is to take the correlation (or covariance) matrix that you'd like to be the population parameter, do some type of decomposition on it (eigen/spectral decomposition, cholesky, etc.) and multiply it times the vectors you'd like to have correlated.

    lemme give you an example using the cholesky decomposition. say the correlation matrix i want in the population is for 4 variables and looks like:

    Code: 
    > R <- matrix(rep(0.5,16),4,4)
    > diag(R) <- 1
    > R
         [,1] [,2] [,3] [,4]
    [1,]  1.0  0.5  0.5  0.5
    [2,]  0.5  1.0  0.5  0.5
    [3,]  0.5  0.5  1.0  0.5
    [4,]  0.5  0.5  0.5  1.0
    so a 4X4 correlation matrix with everything correlated at 0.5 (in the population).

    first you generate 4 normal variables (standardized just to make things simpler) and assemble them in a matrix. i used a sample size of N=1000 for no particular reason

    Code: 
    X <- cbind(rnorm(1000),rnorm(1000),rnorm(1000),rnorm(1000))
    colnames(X) <- c("X1", "X2", "X3", "X4")
    then the only thing you have to do is a cholesky decomposition of the correlation matrix R and post-multiply that times the data matrix X:

    Code: 
    cholR <- chol(R)
    
    data1 <- X%*%cholR
    if you calculate the correlation matrix on the new dataset "data1" you will see that it is (within sampling variability) close to the population correlation R

    Code: 
    > cor(data1)
              [,1]      [,2]      [,3]      [,4]
    [1,] 1.0000000 0.4939773 0.5189865 0.5142373
    [2,] 0.4939773 1.0000000 0.4923293 0.4996839
    [3,] 0.5189865 0.4923293 1.0000000 0.5221356
    [4,] 0.5142373 0.4996839 0.5221356 1.0000000
    >
    now, if you want everything in just 1 step, just use the 'mvrnorm' function in the MASS package that generates multivariate random normal variates with a pre-specified population vector of means and variance-covariance matrix of your choice:

    Code: 
    > library(MASS)
    > mu <- c(0,0,0,0) # vector of means
    > data2 <- mvrnorm(1000, mu, R)
    > cor(data2)
              [,1]      [,2]      [,3]      [,4]
    [1,] 1.0000000 0.5149381 0.5311652 0.4995711
    [2,] 0.5149381 1.0000000 0.5297210 0.5313051
    [3,] 0.5311652 0.5297210 1.0000000 0.5254646
    [4,] 0.4995711 0.5313051 0.5254646 1.0000000
    >
    for all your psychometric needs! https://psychometroscar.wordpress.com/about/

  3. The Following User Says Thank You to spunky For This Useful Post:

    consuli (02-19-2015)

  4. #3
    Points: 7,920, Level: 59
    Level completed: 85%, Points required for next Level: 30

    Posts
    160
    Thanks
    1
    Thanked 7 Times in 7 Posts

    Re: How to generate multiple correlated variables?


    Thank you so much for your great answer! It answers either the mathematical part of the question and either the stat programming part of the question!
    Prediction is very difficult, especially about the future. (Niels Bohr)

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats