# How to generate an autocorrelated random variable?

#### consuli

##### Member
Hi,

I know I can generate two correlated random variables x1 an x2 using

x1= z1
x2= c* z1 + sqrt(1- c^2)* z2
Where
z1: standardnormal random variable
z2: standardnormal random variable
c= Correlation(x1, x2)

But how can I generate an autocorrelated variable x, autocorrelated on x(t-1) ?

#### BGM

##### TS Contributor
For simplicity lets say you are generating a AR(1) process with model

$$X_t = \phi X_{t-1} + \epsilon_t$$

Then

1. First you decide the initial value, say $$X_0$$

2. Generate $$\epsilon_1 \sim \mathcal{N}(0, \sigma^2)$$ (or any other distribution you want)

3. Obtain $$X_1= \phi X_0 + \epsilon_1$$

4. Repeat step 2-3 for $$X_2, X_3, X_4, \ldots$$ until you obtain enough for your time series.

#### consuli

##### Member
BGM, did you recognize, that my example above delivers a defined correlation c, where c= Correlation(x1, x2) ?

So, the process to build an autocorrelated random variable with a defined autocorrelation should include an autocorrelation coefficient, right?

#### Dason

consuli, did you recognize, that in your original post you don't say that the defined correlation needs to be an explicit parameter. In this case it turns out that the autocorrelation for lag 1 (you never specified if you only want to control the lag 1 autocorrelation or more/less than that) is a parameter in the problem but it was just given a different symbol. Do some reading on simple AR(1) or AR(p) processes. If you want a different type of process that can generate autocorrelation you could look at MA(q) processes or the combination of the two ARMA(p,q). You can get much more complex than all of this. But for your simple request BGMs proposal is probably the easiest way to go about it. It wouldn't be a bad exercise for you to take a little bit of time and derive the autocorrelation for the AR(1) process (it's a pretty easy task).

#### consuli

##### Member
in your original post you don't say that the defined correlation needs to be an explicit parameter.
Yes, you are absolutely right. I could have formulated this more precise.

you never specified if you only want to control the lag 1 autocorrelation or more/less than that
For the moment I am satisfied with controlling the lag 1 autocorrelation. But I guess, I will ask for controlling lag t correlation in future, where t € {1, 2, 3, ..., 9, 10}, when I am proceeding with my simulation of value at risk. So if you have a solution for controlling autocorrelation of a standard normal variable from lag 1 up to lag 10, this would be very welcome.

#### Dason

I did mention AR(p), MA(q), and ARMA(p,q) processes. These are the "classical" ways to model time series but also give a clean data generation process.

You could also just use the multivariate normal distribution and give it the covariance matrix you want. Note that you do need to keep that matrix positive definite. It could be tricky to guarantee that criteria if you get too crazy with trying to specify weird lag-1, lag-2, ..., lag-10 correlations.

#### spunky

##### Doesn't actually exist
so i tried to take a stab at it and i think (hope?) something like this would work. now, just please keep something in mind: i am not an expert in time-series, auto-correlation anything. i'm just kind of working at this from my very basic understanding of how the durbin-watson test for auto-correlated residuals work in regression so that's the framework i'm using to generate data (i.e. auto-correlated residuals).

so the first step is to generate the residulas here. for this i will sample from a bivariate normal distribution and i'm guessing it would be a auto-correlated process with a lag of 1 and the auto-correlation would be 0.5. the key is basically to intercalate the two columns of correlated random vectors so you end up with only 1 vector with double the length you initially specified.

so something like this:

Code:
library(MASS)

mu <- c(0,0)
S <- matrix(c(1,.5,.5,1),2,2)
N <- 500

datum <- mvrnorm(N,mu,S)

e <- c(rbind(datum[,1], datum[,2]))
so those are my auto-correlated 'e'. now i only have to build a simple linear regression model:

Code:
x <- 1*rnorm(1000)
y <- 1 + x +e

mod1 <- lm(y~x)
and if i do a durbin-watson test using the 'car' package

Code:
mod1 <- lm(y~x)

durbinWatsonTest(mod1)

> durbinWatsonTest(mod1)
lag Autocorrelation D-W Statistic p-value
1       0.2655003      1.467445       0
Alternative hypothesis: rho != 0
i get a significant p-value which i *think* implies you have autocorrelated residuals... right? am i on the right track here?

i think the issue is just to make sure you can change between-column dependencies to become between-row dependencies...right? wrong? maybe?

#### Dason

Some of the elements in your vector are correlated but not all. Plus you aren't specifying the autocorrelation if you do that.

Code:
j <- acf(e)
j
for me gave
Code:
[COLOR=#000000]     0      1      2      3      4      5      6      7      8      9     10     11   1.000  0.198 -0.014  0.014 -0.031 -0.005 -0.046 -0.055 -0.015  0.030 -0.005 -0.049      12     13     14     15     16     17     18     19     20     21     22     23  -0.029 -0.020 -0.067  0.000  0.014 -0.029  0.013 -0.013  0.014  0.024 -0.047  0.007      24     25     26     27     28     29     30  -0.006  0.042  0.031  0.008  0.012  0.032  0.066

[/COLOR]
So you got a lag-1 autocorrelation of about .198. That's not what you wanted I'm guessing. You did a weird thing by generating independent sets of dependent variables and then just combining them. You essentially make it so that if you go through the vector you alternate between correlated and independent observations.

#### spunky

##### Doesn't actually exist
thank you! i guess my limited understanding of how auto-correlation worked was that there would be some sort of row-to-row of data correlation. now i see that this is more complicated than just generating data and intercalating it.... although i guess even that procedure does give me some sort of auto-correlation because you did get that 0.198 number?

weird... :/

#### consuli

##### Member
You could also just use the multivariate normal distribution and give it the covariance matrix you want.

Dason, did that mean the following?

Code:
 n=10000

x= matrix(ncol= 5, nrow= n)
x1= rnorm(n)
x[ , 1]= x1
x[ , 2]= c(0, x1[ 1:(n-1)])
x[ , 3]= c(rep(0, 2), x1[1:(n-2)])
x[ , 4]= c(rep(0, 3), x1[1:(n-3)])
x[ , 5]= c(rep(0, 4), x1[1:(n-4)])

1.00 0.50 0.35 0.15 0.10
0.50 1.00 0.25 0.12 0.05
0.35 0.25 1.00 0.08 0.03
0.15 0.12 0.08 1.00 0.01
0.10 0.05 0.03 0.01 1.00
", stringsAsFactors= F
))

x= x %*% chol(covm)

cor(x)
acf(x[ , 1], plot= F)
[/FONT][/COLOR][COLOR=#000000][FONT=Times New Roman][COLOR=#000000][FONT=Times New Roman]

Does not work!

> acf(x[ , 1], plot= F)

Autocorrelations of series ‘x[, 1]’, by lag

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1.000 0.000 0.003 0.003 -0.024 0.008 0.005 -0.013 -0.005 -0.017 0.004 -0.006 -0.017 0.009 -0.006 0.002
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
0.010 0.013 -0.015 0.015 0.005 0.001 -0.013 0.001 0.004 -0.008 -0.004 0.021 -0.024 -0.001 -0.005 -0.012
32 33 34 35 36 37 38 39 40
-0.003 -0.003 -0.010 0.008 0.011 0.021 0.008 0.006 0.010
>

[/FONT][/COLOR]

#### consuli

##### Member
However, there suprisingly appears some autocorrelation for column 2, 3, 4, 5 !

Code:
n=10000

x= matrix(ncol= 5, nrow= n)
x1= rnorm(n)
x[ , 1]= x1
x[ , 2]= c(0, x1[ 1:(n-1)])
x[ , 3]= c(rep(0, 2), x1[1:(n-2)])
x[ , 4]= c(rep(0, 3), x1[1:(n-3)])
x[ , 5]= c(rep(0, 4), x1[1:(n-4)])
colnames(x)= c("x1", "x2", "x3", "x4", "x5")

1.00 0.50 0.35 0.15 0.00
0.50 1.00 0.00 0.00 0.00
0.35 0.00 1.00 0.00 0.00
0.15 0.00 0.00 1.00 0.00
0.00 0.00 0.00 0.00 1.00
", stringsAsFactors= F
))

x= x %*% chol(covm)
colnames(x)= c("x1", "x2", "x3", "x4", "x5")

cor(x)
acf(x[ , 1], lag.max= 5, plot= F)
acf(x[ , 2], lag.max= 5, plot= F)
acf(x[ , 3], lag.max= 5, plot= F)
acf(x[ , 4], lag.max= 5, plot= F)
acf(x[ , 5], lag.max= 5, plot= F)
So, maybe it will work this way, when shifting the columns 2, 3, 4, 5 in the other direction timely forward and specifying the natural correlations between columns 2, 3, 4, 5?

#### consuli

##### Member
I have researched the internet, how to create a random series with a defined autocorrelation. Found the following

http://www.mapleprimes.com/questions/40278-Generating-A-Sequence-Of-Autocorrelated

The maple example translated to R generates a defined autocorrelation.

Code:
n= 1000
p= 0.5
x_tm0= rnorm(n)
x_tm1= c(0, x_tm0[1:(n-1)])
# x_tm1
b= rbinom(n, 1, p)
# b_tm1

x1= b* x_tm1+ (1-b)* x_tm0
acf(x1, lag.max= 5, plot= F)
However it is pretty nasty, to simply repeat x(t-1) (which is called x_tm1 in the code).

So I thought about mixing x and x(t-1) together.

Code:
x2= p* x_tm1+ (1-p)* x_tm0
acf(x1, lag.max= 5, plot= F)
For somehow reasons, this works only for a weight p of 0.5. #### consuli

##### Member
Why are you doing all of this? Name one problem with what BGM posted in post 2: http://www.talkstats.com/showthread...andom-variable?p=171517&viewfull=1#post171517

It does exactly what you asked for and is probably the easiest to interpret what is going on and easy to code.
I asked my question for the purpose of simulation. The first step in a simulation is to replicate the empirical data. So when the empirical data has an autocorrelation vector of [1, 0.5, 0.3, 0.2, 0.15, 0.10] (lag 0 to lag 5), I have to replicate exactly this autocorrelation vector. As far the AR process does not offer a map from the AR parameters to autocorrelation parameters and vice versa, an AR process does not help me for my problem.