# Which test should I use?

#### Neta_Friedman

##### New Member
Hello all,

I am a novice in statistics so I hope I explain my question so that you understand, here goes.

I have a dataset of archaeological sites that are distributed over a large area. I divided this area into regions according to vegetation characteristics. I've summarized the sites' distributions in the different regions and now I want to compare those results with the natural distribution. Put differently, I want to know whether sites are distributed randomly or according to environmental preferences (ex. in green areas rather than deserts)?

Hope everything is clear

#### Karabiner

##### TS Contributor
How many sites are there, and how many areas? Home many different types of areas are there? Home many criteria did you use to identify the type of area, and were the criteria quantitative or categorical? Thor areas are of different sitze, I suppose?

With kind regards

Karabiner

#### gianmarco

##### TS Contributor
Hello,
I do not know what type of data you have, are you working with some GIS software? Do you have GIS data? For instance, a geometry representing your areas, and a geometry representing your sites?

Assuming that you have the above type of data, and assuming that your regions completely cover your study area (with no gap in-between), you may want to test if the distribution of points within your set of polygons (totally covering the study area) can be considered random, or if the observed points count in each polygon is larger or smaller than expected.

I have built a function in my R package 'GmAMisc'. The function is called 'pointsInPolygons()'. The calculations relative to the above scenario are based on the binomial distribution: the probability of the observed counts is dbinom(x, size=n.of.points, prob=p), where 'x' is the observed number of points within a given polygon, 'n.of.points' is the total number of points, and 'p' is equal to the size of each polygon relative to sum of the polygons' area. The probability that x or fewer points will be found within a given polygon is pbinom(x, size=n.of.points, prob=p).

If you have GIS data, you can feed them into R and use the function.

I attach an example of the output (observed vs. expected counts of points withing polygons, and p values).  Best

#### Neta_Friedman

##### New Member
How many sites are there, and how many areas? Home many different types of areas are there? Home many criteria did you use to identify the type of area, and were the criteria quantitative or categorical? Thor areas are of different sitze, I suppose?

With kind regards

Karabiner
Hi,
I have 437 sites. For now, I want to look at them together but later I might also decide to subdivide them into smaller groups (three, or seven). This is based on archaeological data such as chronology and typology (tool types).
There are five area types, which were defined by ecologists as environmentally distinct (arid, semi-arid, dunes, human-influence and other).
Data are categorical. The areas are not similar in size.

I'm posting below actual numbers in the hope it might help.

area name area %
Saharo-Arabian 43984.6 59
Sand 13977.6 18.7
Irano-Turanian 6405.9 8.6
synanthropic 2710.8 3.6
other 7491.1 10
total area 74570.1 100

number of sites 437

thanks again

#### Neta_Friedman

##### New Member
Hello,
I do not know what type of data you have, are you working with some GIS software? Do you have GIS data? For instance, a geometry representing your areas, and a geometry representing your sites?

Assuming that you have the above type of data, and assuming that your regions completely cover your study area (with no gap in-between), you may want to test if the distribution of points within your set of polygons (totally covering the study area) can be considered random, or if the observed points count in each polygon is larger or smaller than expected.

I have built a function in my R package 'GmAMisc'. The function is called 'pointsInPolygons()'. The calculations relative to the above scenario are based on the binomial distribution: the probability of the observed counts is dbinom(x, size=n.of.points, prob=p), where 'x' is the observed number of points within a given polygon, 'n.of.points' is the total number of points, and 'p' is equal to the size of each polygon relative to sum of the polygons' area. The probability that x or fewer points will be found within a given polygon is pbinom(x, size=n.of.points, prob=p).

If you have GIS data, you can feed them into R and use the function.

I attach an example of the output (observed vs. expected counts of points withing polygons, and p values). View attachment 2650 View attachment 2650

Best
Hello,
I do not know what type of data you have, are you working with some GIS software? Do you have GIS data? For instance, a geometry representing your areas, and a geometry representing your sites?

Assuming that you have the above type of data, and assuming that your regions completely cover your study area (with no gap in-between), you may want to test if the distribution of points within your set of polygons (totally covering the study area) can be considered random, or if the observed points count in each polygon is larger or smaller than expected.

I have built a function in my R package 'GmAMisc'. The function is called 'pointsInPolygons()'. The calculations relative to the above scenario are based on the binomial distribution: the probability of the observed counts is dbinom(x, size=n.of.points, prob=p), where 'x' is the observed number of points within a given polygon, 'n.of.points' is the total number of points, and 'p' is equal to the size of each polygon relative to sum of the polygons' area. The probability that x or fewer points will be found within a given polygon is pbinom(x, size=n.of.points, prob=p).

If you have GIS data, you can feed them into R and use the function.

I attach an example of the output (observed vs. expected counts of points withing polygons, and p values). View attachment 2650 View attachment 2650

Best
Hi, Gianmarco
I am working with GIS, and do have the geometry. your suggestion sounds like exactly what I need. the polygons do cover the entire study area, but the environmental zones I'm using (the areas) aren't continuous but patchy (i.e. there are two or more non-neighbouring areas that are defined the same).
In a later test, I would like to perform a similar test with much smaller areas (thoroughly surveyed areas) that do not cover the entire area.
In any case, I'm ashamed to say I don't know how to work with R.

#### Karabiner

##### TS Contributor
If area type and site density are independent, then the
437 Sites would be distributed across types according
to type size.

E.g. 59% of 437 = 258 sites would be expected
in type 1 areas.

You can compare the sample distribution with the expected
distribution across the 5 types using a Chi Square test.

With kind regards

Karabiner

#### gianmarco

##### TS Contributor
If area type and site density are independent, then the
437 Sites would be distributed across types according
to type size
.

E.g. 59% of 437 = 258 sites would be expected
in type 1 areas.

You can compare the sample distribution with the expected
distribution across the 5 types using a Chi Square test.

With kind regards

Karabiner
I would agree with that, provided that the areas completely cover the study area, with no gap in-between.

Last edited:

#### gianmarco

##### TS Contributor
Hi, Gianmarco
I am working with GIS, and do have the geometry. your suggestion sounds like exactly what I need. the polygons do cover the entire study area, but the environmental zones I'm using (the areas) aren't continuous but patchy (i.e. there are two or more non-neighbouring areas that are defined the same).
In a later test, I would like to perform a similar test with much smaller areas (thoroughly surveyed areas) that do not cover the entire area.
In any case, I'm ashamed to say I don't know how to work with R.