THE PROBLEM
The transport company XYZ owns delivery trucks traveling back and forth along 1,000 miles of roads within a state, subdivided into 10 counties. Some of the trucks are being stopped and robbed.
Due to the big distance involved, police are not able to prevent the overwhelming majority of robbery events, as they don’t know where, let alone when these events will occur. They are normally called after an event takes place and make the corresponding police report.
NOTE: The number of trucks is irrelevant for this model.
THE TASK
I want to be able to predict the probability of a number of robbery events taking place in, say the next 30 days, that is where in the county roads will they take place. Based on the probability level, I will decide where should a police task force be positioned, to prevent a robbery from happening, or immediately arrive to the scene of a robbery in progress, and thus forestall it.
ARCHITECHTURE OF THE MODEL
I pose the problem in the following way:
1) Divide the 1,000 miles into 10 counties, say of 100 miles per county all nodes (road segments) measuring 10 miles each. This means we will have a total of 100 nodes in the 10 counties, that is, 10 nodes per county.
2) Nodes will be identifiable by latitude and longitude.
3) Register the events (robberies) taking place every month in each node, during the previous year (2015), divide it into 12 months and round up the value to the next integer. Each event will be identified by its latitude and longitude coordinates, thus they will be included in the corresponding node, which will also be identified by its coordinates.
The result will be the value of λ for each particular node in a given month.
4) I will substitute the previous year (2015) monthly values, for the current month (2016) values (not add it) when and if it occurs, and use the corresponding new value of λ to calculate the probability of one or more events occurring in a particular node during the next month.
5) My objective is to determine which are the nodes where it is most likely that an event will occur, and thus, tell police to position and be especially vigilant in these nodes, and forget about the rest. The idea would be say, to select the 3 nodes with the highest probability, by calculating the cumulative Poisson probability of each as:
Cumulative Probability: P(X < λ) and the Poisson random variable (x) also equal to λ
I decided to use this probability (above) over the following:
- Poisson Probability: P(X = λ)
- Cumulative Probability: P(X < λ)
- Cumulative Probability: P(X > λ)
- Cumulative Probability: P(X > λ).
I have used an online Poisson probability calculator to determine the value of each option and compare said values to select the 3 nodes with the highest cumulative value, as the nodes targeted for the police to concentrate on.
In short, I will update daily the values of lambdas for each node and calculate the corresponding cumulative probability to select the 3 nodes with the highest probability. Thus I will be able to tell police the 3 nodes where they need to concentrate their efforts.
This problem has similarities to the case of the German V1 bombs over London in 1944.
What do you think? Should something be changed? Can it be improved?
Thanks for any help and recommendations you can provide!
Tweet |