Stochastic vs. deterministic

#1
Randomly sample the number of points defined (e.g. 10 points) from a normal distribution with a mean value x1 and standard deviation s1 to represent the data we would obtain in real life population. This would produce group1 data points. Then do the same thing from another normal distribution with a mean value x2 and standard deviation s2 to arrive at group2 data points.

Then run a t-test to arrive at a confidence level using a two-tailed test.

Do this 100 times and average the confidence levels found.

I was expecting to obtain the same results as doing a deterministic calculation where I calculate t = (x1 – x2)/s, where s = sqrt((n1 – 1)s1^2 + (n2 – 1)s2^2)*(1/n1 + 1/n2)/(n1 + n2 -2)).

The reason I was expecting the same results over the long run is because randomly sampling from a normal distribution with x1, s2, x2, s2 should produce a similar situation as the deterministic case calculation. But, I am not getting the same results, the stochastic simulation has a lower average confidence level than the deterministic calculation.

Where is the fallacy in the procedure?
 

JohnM

TS Contributor
#2
When I've done simulations such as these I conduct at least 1000 or 5000 runs. After 100 runs I'm not surprised that you have a different answer.

How far off was it?
 
#3
I just ran a 5000-run simulation. Results are still 10% below deterministic results. Deterministic shows 93% confidence while simulation shows 82%.

The deterministic calculation is assuming that we get exactly the stated x1, s1, x2, s2, for a certain sample size. I am thinking the difference is due to the nature of simulation, where the mean and standard deviation in the long run will match x1, x2, s1, s2 but for the sample sizes taken, they will deviate around the mean.

Meaning, for a sample of say 20 points, randomly drawn from a normal distribution with mean = 10 and standard deviation = 5, the sample mean could be 9 or could be 11, and the standard deviation of the sample could 4.5 and could be 5.5.

But the confidence level is a max of 1. Meaning if I by chance get a sample not representative of my stated mean and stdev, then that will decrease my confidence level drastically but even if the sample results in a confidence level higher than my deterministic situation, it is capped at 1, so if we start averaging this out over many runs, each with a sample size of 20 points, then the average confidence level will be lower as it maxes out at 1 and has a min of 0.

I think it is sort of like randomly sampling from (0,1) but capping anything above 1.5 and setting that value to 1.5. Then the mean of this sample will not be 0 but probably less than 0.

What are your thoughts?
 

JohnM

TS Contributor
#4
It may be that for the deterministic method, you have a single point, and for the stochastic method, the distribution is not symmetric around that deterministic point, leading to an "average" that is lower (may be skewed lower).

I would do a histogram of the stochastic results to see the distribution around the deterministic value.
 
#5
Yeah, the t-test is like a transform which produces a confidence level between 0 and 1. So even though the random number generator is symmetric, the resulting confidence level is not and so the stochastic case will always produce an average confidence level closer to 0.5 than the deterministic confidence level.

Thanks for the help!
 
#6
need help emergency
Questions:
I have performed stochastic simulations with lognormal input and output. The mean for the three sample simulations were 169, 580, 799, and a standard deviation of 20% of mean. After 1000 iterations, the deterministic Monte Carlo Simulation averages were, 27%, 22%, 24% respectively, lower than the expected deterministic value (sample sizes were 15, 28, 25 respectively).
One expects that the difference between the deterministic and stochastic averages for a lognormal distribution would be a little lower compared to that of a normal distribution.
Question: Is a comparison between the deterministic value and the mean of Monte Carlo simulation possible when the distribution is skewed (i.e. lognormal)? If so, what is the relationship between the deterministic value and the mean of a Monte Carlo simulation? Would you consider the above results (average of Monte Carlo simulation, 27%, 22%, and 24% lower than determinstic) to be reasonable or not?
 
Last edited: