Are you asking how to mathematically figure out what they are or you asking how to do monte carlo simulation to figure out what they are?
Hello,
I'm reading an article in which a monte carlo simulation is performed, but have the following problem:
the data is generated as:
x1=N(2,2)
x2=U(3,8)
P(A=1)=.5
P(Y=1|A,x1,x2)=1/1+exp[-(1.2A-5*x1^2+2*x2)]
the true values are given as : P(Y1=1)=.372 and P(Y0=1)=.352
(where Y1 is the value of Y given that A=1 and Y0 likewise when A=0)
I can't figure out how to compute these true values, so if somebody can give me some help here?
Are you asking how to mathematically figure out what they are or you asking how to do monte carlo simulation to figure out what they are?
I am asking how to figure them out mathematically.
I initially thought: the mean of x1 and x2 are 2 and 5.5 respectively,
so for example P(Y1=1)=1/1+exp[-(1.2-5*2^2+2*5.5) but that's not the idea I guess?
Nope. I'm guessing x1 and x2 and A are all independent?
By the way: I don't think this will be an easy problem? Using monte carlo simulation will get you those answers (or at least something really close to the answers) pretty easily.
Yes, I think they are independent, at least no covariance matrix is given..
I see what you mean by approaching the true values with a monte carlo simulation, but the idea here is that the article generates data according to the laws I just described and discusses several ways to estimate P(Y1=1) and P(Y0=1) or measures following from it (e.g. RR, odds ratio etc)..
So to know how well each method approaches the true values you need to know the actual true values, but how? I need to know since I want to carry out a MC simulation myself with the same goal (comparing different methods)
No, your idea is not the correct way to do this...Rather, here is how to do this, using Monte Carlo techniques...which is what I think this is really all about.
(1) Generate pseudo-random deviates for X1 and X2 with the specified parameters as given above.
(2) Let Z1 = -(1.2 - 5*X1^2 + 2*X2) for Y = 1.
(3) Let Z2 = -( -5*X1^2 + 2*X2) for Y = 0.
(4) Use the Logistic CDF (Mean 0 and Scale 1) and obtain the the cumulative probabilities for both Z1 and Z2.
(5) Separately compute the averages of the cumulative prob's associated with Z1 and Z2 and these will be .628 and .648, respectively.
(6) Take the complements of the averages (i.e. 1 - .628; and 1 - .648) you compute in Step (5) and these values will yield the answers you are looking for .372 and .352.
And, that should do it.
Thanks again for your advice Dragan, that was indeed what I was looking for. I was wondering, suppose that A is a function of x1 and x2 as well, so that for example now we have:
x1=N(2,2)
x2=U(3,8)
P(A=1|x1,x2)=1/1+exp[-(-(2*x1+2*x2)]
P(Y=1|A,x1,x2)=1/1+exp[-(1.2A-5*x1^2+2*x2)]
Would this change a lot in the procedure?
Well if A can only be 0 or 1 why don't you just generate a random observation for A and then for x1 and x2, and then use that to generate an observation to see if Y is 0 or 1. Do that a couple hundred thousand times and compute the average of the Y's. There you have an estimate of the probability Y=1.
Hi Dragan,
yes sorry, the expression should be:
P(A=1|x1,x2)=1/1+exp[-(2*x1+2*x2)]
regards Alex
I think I found the solution. I did it with R, for example the previous example with P(A=1)=.5 also delivered the right values with:
n=10000
n.sim=5000
y1=rep(NA,n.sim)
y0=rep(NA,n.sim)
for (s in 1:n.sim){
x1=rnorm(n,2,2)
x2=runif(n,3,8)
A=rbinom(n,1,.5)
y=plogis(1.2*A-5*x1^2+2*x2)
dat=as.data.frame(cbind(A,y))
y1[s]=mean(dat$y[A==1])
y0[s]=mean(dat$y[A==0])
}
mean(y1)
mean(y0)
so I guess changing A=rbinom(n,1,.5) to A=rbinom(n,1,plogis(2*x1 + 2*x2)) does the trick right?
I got these values:
> mean(y1)
[1] 0.3722326
> mean(y0)
[1] 0.05885827
Tweet |