Hello,
I am trying to put togheter an R function to test if points in a given area tend to occur closer to any point belonging to another set of point, or if they are distributed irrepsective of the distance from the latter.
I would like to have some feedback on the method I applied, especially on the use of the binomial distribution.
With reference to the attached image, let's imagine we have some locations (crosses) and some other locations (red dots). For the sake of argument, let's assume the crosses are water pumps, and the dots the location where a disease is recorded as present.
The question to address might be: do diseases tend to occur in the vicinity of any of the water pumps?
The method:
The image I attached is showing some relevant info derived from the results above.
The results seem to make sense, at least to me. But I would very like to have feedbacks, especially with regards on the operation on probability.
Gm
I am trying to put togheter an R function to test if points in a given area tend to occur closer to any point belonging to another set of point, or if they are distributed irrepsective of the distance from the latter.
I would like to have some feedback on the method I applied, especially on the use of the binomial distribution.
With reference to the attached image, let's imagine we have some locations (crosses) and some other locations (red dots). For the sake of argument, let's assume the crosses are water pumps, and the dots the location where a disease is recorded as present.
The question to address might be: do diseases tend to occur in the vicinity of any of the water pumps?
The method:
- equally divide the area around water pumps using Thiessen polygons; each disease falling in each polygon will be closer to the polygon source (i.e., water pump) than to any other source;
- calculate the percentage of the area of each polygon in relation to the total area covered by the polygons (let's call it %area);
- count how many diseases falls in each polygon (let's call it bypoly.points);
- calculate the expected number of diseases in each polygon; it should be (if I am not mistaken) equal to the total number of diseases (i.e., total number of points in the whole study area) times the %area;
- calculate the probability of the observed count in each polygon:
dbinom(bypoly.points, tot.n.points, %area)
- calculate the probability of observed diseases <= than expected:
pbinom(bypoly.points, tot.n.points, %area)
- calculate the probability of observed diseases >= than expected:
1-pbinom(bypoly.points, tot.n.points, %area)
Code:
polygon.area %area obs.n.points exp.n.points p.obs p.<=exp p.>=exp
[1,] 4466753 0.14 17 3.51 0.00000 1.00000 0.00000
[2,] 5845596 0.18 0 4.59 0.00628 0.00628 0.99372
[3,] 6105211 0.19 4 4.79 0.19566 0.46190 0.53810
[4,] 8533652 0.27 2 6.70 0.01648 0.02064 0.97936
[5,] 6888852 0.22 2 5.41 0.05154 0.06935 0.93065
The results seem to make sense, at least to me. But I would very like to have feedbacks, especially with regards on the operation on probability.
Gm
Attachments
-
36 KB Views: 4