# Thread: Generate Random Correlated Variable

1. ## Generate Random Correlated Variable

I have established a correlation between independent X and dependent Y. Is there a formula for generating a random Y from a particular X? For example, if X is 5, then I want to generate a random Y based on a given correlation.

2. ## Re: Generate Random Correlated Variable

You would need to know more than just "X = 5". We would need to know something about the distributions and the moments of those distributions of the random variables of interest. If for example you wanted X and Y to be bivariate normally distributed and you know what you want the marginal distributions of X and Y to be along then we know that Y conditioned on X has a normal distribution with parameters that depend on the correlation and the means and variances (and of course on the observed value of X).

The wikipedia page gives the exact distribution of interest in this case: http://en.wikipedia.org/wiki/Multiva..._distributions

3. ## Re: Generate Random Correlated Variable

Hi guys: a very elementary question:

how could I best analyze data like the following:

# of days cut classes # days being late
Student A 9 8
Student B 6 1
Student C 2 0
Student D 1 8

4. ## Re: Generate Random Correlated Variable

# of days cut classes # days being late
Student A 9 8
Student B 6 1
Student C 2 0
Student D 1 8

6. ## Re: Generate Random Correlated Variable

Originally Posted by Dason
You would need to know more than just "X = 5". We would need to know something about the distributions and the moments of those distributions of the random variables of interest. If for example you wanted X and Y to be bivariate normally distributed and you know what you want the marginal distributions of X and Y to be along then we know that Y conditioned on X has a normal distribution with parameters that depend on the correlation and the means and variances (and of course on the observed value of X).

The wikipedia page gives the exact distribution of interest in this case: http://en.wikipedia.org/wiki/Multiva..._distributions
Sorry but I do not understand. I have found the following answer on another site.
Let X be your fixed variable and you want to generate Y variable that correlates with X by amount r . If X is standardized then Y=r⋅X+E , where E is random variable from normal distribution having mean 0 and sd=sqrt(1−r^2).
Now, if you want to attain the correlation (almost) exactly r , you need to provide that E has zero correlation with X . This tightening it to zero can be reached by modifying E iteratively. Unfortunately I can't help with R but in case you use SPSS you can find macro FITVAR on my web-page (See "Fit covariates" collection) that will do all the task from start to end for you for any number of variables at once; for example you could train Y to custom correlations with not one but with several X s.
Does this work? In this case, is "r" equal to the correlation coefficient generated by Excel's CORREL function? Can I calculate E using Excel's NORMDIST function?

Thanks for any help.

7. ## Re: Generate Random Correlated Variable

Originally Posted by cisaak
Sorry but I do not understand. I have found the following answer on another site.
Let X be your fixed variable and you want to generate Y variable that correlates with X by amount r . If X is standardized then Y=r⋅X+E , where E is random variable from normal distribution having mean 0 and sd=sqrt(1−r^2).
Now, if you want to attain the correlation (almost) exactly r , you need to provide that E has zero correlation with X . This tightening it to zero can be reached by modifying E iteratively. Unfortunately I can't help with R but in case you use SPSS you can find macro FITVAR on my web-page (See "Fit covariates" collection) that will do all the task from start to end for you for any number of variables at once; for example you could train Y to custom correlations with not one but with several X s.
Does this work? In this case, is "r" equal to the correlation coefficient generated by Excel's CORREL function? Can I calculate E using Excel's NORMDIST function?

Thanks for any help.

Simply generate two independent standard normal variables X and E.

Then create Y where Y = r*X + Sqrt[1 - r^2]*E

That's all you need to do to obtain a correlation of r between Y and X.

8. ## Re: Generate Random Correlated Variable

Originally Posted by Dragan
Simply generate two independent standard normal variables X and E.

Then create Y where Y = r*X + Sqrt[1 - r^2]*E

That's all you need to do.
I am aware of your equation. But I want a correlated random Y from a known X--not a generated X.

9. ## Re: Generate Random Correlated Variable

Originally Posted by cisaak
I am aware of your equation. But I want a correlated random Y from a known X--not a generated X.
Are you talking about a non-stochastic X --- as in regression where X is assumed to be fixed? Like this:

Y = bo + b1*X + E

10. ## Re: Generate Random Correlated Variable

Originally Posted by Dragan
Are you talking about a non-stochastic X --- as in regression where X is assumed to be fixed? Like this:

Y = bo + b1*X + E
Maybe I am hung up on terms. I thought stochastic = random and non-stochastic = deterministic. Since I wanted a random Y based on a regression using a fixed X, I do not know how to categorize my question.

Anyway, it appears you now understand my question. Is the above formula my answer? Does E = a random variable from a normal distribution having mean = 0 and sd = sqrt(1-r^2)?

11. ## Re: Generate Random Correlated Variable

Originally Posted by cisaak
Maybe I am hung up on terms. I thought stochastic = random and non-stochastic = deterministic. Since I wanted a random Y based on a regression using a fixed X, I do not know how to categorize my question.

Anyway, it appears you now understand my question. Is the above formula my answer? Does E = a random variable from a normal distribution having mean = 0 and sd = sqrt(1-r^2)?

Regardless of whether X is fixed or stochastic, you're still going to have to "create" X in some manner, cisaak.

If you generate E as standard normal, then using the regression equation above the correlation between Y and X will be

r = b1 / Sqrt [1 + b1^2]

Note: That assumes X is standardized with a mean of zero and unit variance.

12. ## Re: Generate Random Correlated Variable

Originally Posted by Dragan
Regardless of whether X is fixed or stochastic, you're still going to have to "create" X in some manner, cisaak.

If you generate E as standard normal, then using the regression equation above the correlation between Y and X will be

r = b1 / Sqrt [1 + b1^2]

Note: That assumes X is standardized with a mean of zero and unit variance.
So is this the full equation?

Y = b0 + (b1 * X) + E

where
b0 = Y intercept of regression
b1 = X coefficient of regression
X = a specific instance of X (not standardized)
E = a random variable from a normal distribution having mean = 0 and sd = sqrt(1-r^2)
r = b1/sqrt(1 + b1^2)

Or does b1 in definition of r refer to a random number between 0 and 1?

13. ## Re: Generate Random Correlated Variable

Originally Posted by cisaak
So is this the full equation?

Y = b0 + (b1 * X) + E

where
b0 = Y intercept of regression
b1 = X coefficient of regression
X = a specific instance of X (not standardized)
E = a random variable from a normal distribution having mean = 0 and sd = sqrt(1-r^2)
r = b1/sqrt(1 + b1^2)

Or does b1 in definition of r refer to a random number between 0 and 1?

A couple of points to make it a bit more flexible:

E has a mean of zero and standard deviation of "sigma"

X just needs to be standardized to a mean of zero and standard deviation of 1 (just a simple linear transformation will do that)

b1 can be any finite number

b0 can be any finite number

This will change the forumula to:

Corr[Y,X] = r = b1 / Sqrt[ Sigma^2 + b1^2]

Notes: E does not need to normally distributed. Also the covariance (correlation) between X and E must be zero.

14. ## Re: Generate Random Correlated Variable

Originally Posted by Dragan
A couple of points to make it a bit more flexible:

E has a mean of zero and standard deviation of "sigma"

X just needs to be standardized to a mean of zero and standard deviation of 1 (just a simple linear transformation will do that)

b1 can be any finite number

b0 can be any finite number

This will change the forumula to:

Corr[Y,X] = r = b1 / Sqrt[ Sigma^2 + b1^2]

Notes: E does not need to normally distributed. Also the covariance (correlation) between X and E must be zero.
I appreciate your patience with me. Unfortunately, every answer merely generates more questions.

Per your latest response, this appears to be the full equation?

Y = b0 + (b1 * X) + E

where
b0 = Y intercept of regression
b1 = X coefficient of regression
X = standardized X with mean 0 and sd 1
E = a random variable from a normal distribution having mean = 0 and sd = sigma
r = b1/sqrt(sigma^2 + b1^2)

1. By standardizing X, aren't I converting it into a random variable? Remember I want a random Y from a fixed X.
2. Your r formula uses b1. Is this the X coefficient referred to above?
3. What is sigma? I thought it was the same as one sd. If sigma is not sqrt(1-r^2), then how does r affect my equation?

15. ## Re: Generate Random Correlated Variable

Originally Posted by cisaak

1. By standardizing X, aren't I converting it into a random variable? Remember I want a random Y from a fixed X.
See below.

2. Your r formula uses b1. Is this the X coefficient referred to above?
Yes, b1 is the slope coefficient associated with the regression model.

3. What is sigma? I thought it was the same as one sd. If sigma is not sqrt(1-r^2), then how does r affect my equation?
"Sigma" is the standard deviation of the error term (E) in the regression model. You can set it to any postive value you would like. The value of Sigma affects the correlation (r) in the denominator i.e. Sqrt[Sigma^2 + b1^2].

In terms of your first question, I think it would be best if you provided a short concise example of what it is your trying to accomplish. That is, some values of Y and X so contributors can get a better "handle" on your problem.