Hello,
I'm trying to compare a real dataset with a random (synthetic) distribution to determine whether a pattern I see in the data is real or the result of a random distribution or poor sampling.

I've been struggling with this problem for weeks and no number of websites or books are helping. I think that there are lots of methods to do this but I'm not sure which are most appropriate or how to implement them. I would be grateful for any help you could give.

I have a vector of continuous data (distances from a point) that I want to compare with multiple synthetic datasets created from random distributions. As the real data is only a sample, I am creating bootstraps to determine the significance of this data. I think that either Chi-square or ANOVA would be a good way to compare the real and the random data, but how do I compare the variability? Is there some way of creating two ensemble averages for all the random datasets and the real bootstrap samples with a confidence range around it that can be compared for overlap? Maybe bootstrap aggregation? I'm currently viewing the real and random data as histograms where the mean and standard deviation of each bin has been calculated individually. Treating each bin separately doesn't seem correct, though. I'm only using histograms as it seemed like a simple solution to the problem but any other distribution measure would work as well.
Thanks in advance for any help,
Dan