I have some identities that are distributed in a 2D space. Each identity has the properties X, Y, latitude, and longitude. Each dataset on average has 75 of these identities. For a particular dataset, when I plot X as a function of Y, I get no correlation. When I group the identities (say into 10 groups) based on their spatial proximity (using latitude and longitude data) and plot average X and Y for all groups, I get a strong positive correlation.

Why do I see such behavior? Is this a well-known property of datasets? How can I justify this from a statistical point of view in my research paper? Are there any other problems that show similar behavior to which I can refer?

Why do I see such behavior? Is this a well-known property of datasets? How can I justify this from a statistical point of view in my research paper? Are there any other problems that show similar behavior to which I can refer?

Last edited: