OK, someone recently asked me what the probability that followers of Twitter handle X are also followers of Twitter handle Y. I used some R code (using the twitteR package) to get 47,000 followers from X and 61,000 followers of Y (the two handles have a LOT of followers). I created a vector and found that roughly 1200 data points were repeats (so 600 followers came up twice). My question now is, how do I find the proportion of shared followers. This is my reasoning:
p(A and B) = p(B)*p(A)
where the p(A and B) = probability of sampling someone who follows X and Y pages two times (once when sampling for X and once for the Y) = 1200/(61000+46000) = 1200/108000 = 0.011
p(A) = probability of sampling an X follower who also follows the Y
p(B) = the probability of sampling an Y follower who also follows the X
I am going to assume that p(A) = p(B).
so we can substitute 0.011 for p(A and B) and p(A) for p(B) to get:
0.011 = p(B)*p(B)
0.011 = p(B)^2
.104 = p(B) = p(A)
So, roughly 10% of X's followers also follow Y's. Am I correct in this line of reasoning? Is there anything I am missing?
p(A and B) = p(B)*p(A)
where the p(A and B) = probability of sampling someone who follows X and Y pages two times (once when sampling for X and once for the Y) = 1200/(61000+46000) = 1200/108000 = 0.011
p(A) = probability of sampling an X follower who also follows the Y
p(B) = the probability of sampling an Y follower who also follows the X
I am going to assume that p(A) = p(B).
so we can substitute 0.011 for p(A and B) and p(A) for p(B) to get:
0.011 = p(B)*p(B)
0.011 = p(B)^2
.104 = p(B) = p(A)
So, roughly 10% of X's followers also follow Y's. Am I correct in this line of reasoning? Is there anything I am missing?