I am trying to understand Bayesian networks and particularly causal (serial) chains. It is defined as P(C|A&B) = P(C|B).This means that the probability of C, given B, is exactly the same as the probability of C, given both B and A. Knowing that A has occurred doesn’t make any difference to our beliefs about C if we already know that B has occurred.
Lets assume I have a following path: A -> B -> C. I have explored 100 traces and know that A -> B happens in 40% of all cases from A and B -> C happens in 50% of all cases from B. But at the same time I know that A -> B -> C happens in 99% of all cases from B where A is visited previously. Lets also assume that in total we have P(A)=0.15, P(B)=0.2, P(C)=0.25 visits (they do not equal 1 because we have other locations as well). Lets compute probabilities:
P(B|A)=P(A|B)∗P(B)/P(A)=0.4∗0.20.15=0.53P(B|A)=P(A|B)∗P(B)/P(A)=0.4∗0.2/0.15=0.53
P(C|B)=P(B|C)∗P(C)/P(B)=0.5∗0.250.2=0.62P(C|B)=P(B|C)∗P(C)/P(B)=0.5∗0.25/0.2=0.62
And now I am trying to predict what location will be next one given A -> B has occurred. Bayesian formula says that I should use probability P(C|B) = 0.62 (i.e. without taking A into account), but I also know that person visits C in 99% of cases when he also visits A!
Did I understand the formula correctly? And why A is not taken into account? And how to fix it?
Tweet |