Goran_L
08-18-2008, 08:46 AM
As part of my PhD dissertation, I have come up with a new way to adress a common problem in my field (which is industry location). My method seems so simple and straight forward that it feels likely that it is a standard statistical method, but I can't figure out which one it is. I would need someone to tell me "Oh that's easy, that's just a typical Schmittmeyer's Q. It's nothing new - just Google it!"
Here is the problem and my approach.
Problem:
We have employment statistics for two industries - Red and Green - and for the entire population, called Anyone. The statistics is avalable for four regions R1-R4.
We now want to determine if industry RED tends to locate in the same region as industry GREEN.
Data example:
R1 R2 R3 R4 Total
Red 3 1 0 2 = 6
Green 2 3 3 0 = 8
Anyone 50 30 20 50 = 150
(The actual data I use has millions of employees in hundreds of industries and hundreds of regions.)
There are many standard ways to do this, but here is the approach I "invented":
Solution:
I use combinatorial probabilities and calculate the likelyhood for a Red employee and a Green employee to be in the same region, and compare that with the likelyhood for a Red to be in the same region as Anyone.
I calculate like this:
In region R1 there are 3 Red and 2 Green, and they form 3*2=6 Red-Green pairs
In R2: 1*3=3 Red-Green pairs
In R3: 0*3=0 Red-Green pairs
In R4: 2*0=0 Red-Green pairs
Total: 6+3+0+0=9 Red-Green pairs
Maximum possible number: 6*8=48 Red-Green pairs
Likelyhood 9/48=0.19
So there is a 19% chance that a random Red will be in the same region as a random Green.
If we now do the same for Red and Anyone we get:
Likelyhood = (3*50 + 1*30 + 0*20 + 2*50)/(6*150) = 0.31
So the chance for a Red to be in the same region as Anyone is much higher, 31%. This suggests that there is NO tendency for Red to colocate especially with Green, quite the opposite.
Mathematically, we can write this (pardon my poor syntax here, I haven't learned how to write proper formulas on the web yet):
P(i,j) = Sum for all regions r (EMPLri * EMPLrj) / [Sum for all regions r (EMPLri) * Sum for all regions r (EMPLrj)]
where
P(i,j) is the probability of a random employee in industry i to be in the same region as a random employee in indstry j
EMPLri is the number of employees in region r in industry i
EMPLrj is the number of employees in region r in industry j
There is more I could say about what this calculation can be used for, but it will probably not interest you. My question is simply:
Question:
Do you recognise this method? Could you tell me what statistical test it is I am performing?
(I checked some reasonable suspects, like Chi2, and it turns out that this one is similar to Chi2 but different.)
Many thanks!
Göran
Here is the problem and my approach.
Problem:
We have employment statistics for two industries - Red and Green - and for the entire population, called Anyone. The statistics is avalable for four regions R1-R4.
We now want to determine if industry RED tends to locate in the same region as industry GREEN.
Data example:
R1 R2 R3 R4 Total
Red 3 1 0 2 = 6
Green 2 3 3 0 = 8
Anyone 50 30 20 50 = 150
(The actual data I use has millions of employees in hundreds of industries and hundreds of regions.)
There are many standard ways to do this, but here is the approach I "invented":
Solution:
I use combinatorial probabilities and calculate the likelyhood for a Red employee and a Green employee to be in the same region, and compare that with the likelyhood for a Red to be in the same region as Anyone.
I calculate like this:
In region R1 there are 3 Red and 2 Green, and they form 3*2=6 Red-Green pairs
In R2: 1*3=3 Red-Green pairs
In R3: 0*3=0 Red-Green pairs
In R4: 2*0=0 Red-Green pairs
Total: 6+3+0+0=9 Red-Green pairs
Maximum possible number: 6*8=48 Red-Green pairs
Likelyhood 9/48=0.19
So there is a 19% chance that a random Red will be in the same region as a random Green.
If we now do the same for Red and Anyone we get:
Likelyhood = (3*50 + 1*30 + 0*20 + 2*50)/(6*150) = 0.31
So the chance for a Red to be in the same region as Anyone is much higher, 31%. This suggests that there is NO tendency for Red to colocate especially with Green, quite the opposite.
Mathematically, we can write this (pardon my poor syntax here, I haven't learned how to write proper formulas on the web yet):
P(i,j) = Sum for all regions r (EMPLri * EMPLrj) / [Sum for all regions r (EMPLri) * Sum for all regions r (EMPLrj)]
where
P(i,j) is the probability of a random employee in industry i to be in the same region as a random employee in indstry j
EMPLri is the number of employees in region r in industry i
EMPLrj is the number of employees in region r in industry j
There is more I could say about what this calculation can be used for, but it will probably not interest you. My question is simply:
Question:
Do you recognise this method? Could you tell me what statistical test it is I am performing?
(I checked some reasonable suspects, like Chi2, and it turns out that this one is similar to Chi2 but different.)
Many thanks!
Göran