How do I solve a DAG from a given joint distribution function?


New Member
Hello all,

I have a directed graph represented as a joint distribution table between three binary variables;

a b c p(a,b,c)
0 0 0 0.192
0 0 1 0.144
0 1 0 0.048
0 1 1 0.216
1 0 0 0.192
1 0 1 0.064
1 1 0 0.048
1 1 1 0.096

How do I prove the of ordered numbering of the nodes and that there are no such links going to a low-numbered node which also explains that there is no directed cycle?

My understanding - A topological sort of a DAG G is a linear ordering of all its vertices such that if G contains an edge (u, v), then u appears before v in the ordering. [Hence, it cannot have any cycles, because the edge into the earliest vertex of a cycle would have to be oriented the wrong way.]

But how do the individual probabilities prove the ordering? Is it because some of the probabilities are equal (eg P{a=0,b=0,c=0} = P{a=1,b=0,c=0} = 0.192)

ii.) Also, show that when the variable a and b will be marginally dependent and when they will be
iii.) Evaluate the distribution p(a), p(c|a), and, p(b|c) corresponding to the table. Plot a corresponding
directed graph. Hence show p(a|b|c) = p(a)p(c|a)p(b|c).



Less is more. Stay pure. Stay poor.
Is it implied any non-cyclical structure could exist? So, chain, fork, or inverted fork - but all variables have to be connected some how?

So without assumptions - I thought figuring out order was an impossible problem because chance could be at play as well. Of note, 3 nodes could have 25 possible relations, see below. Not theoretical, but the probs sum to unity, so you could simulate 10K obs and play around with that dataset and look for conditional independences.



Less is more. Stay pure. Stay poor.
It is fairly common just to simulate around 10K or more to ensure the models will converge and not run into sparsity issues or sampling error issues (latter not an issue here if you assume the probabilities to be the truth). If you did say 100, then 1,1,0 group would just have ~5 people.


New Member
@hlsmith can you kindly guide how to simulate around 10K? Also, what do you mean by "if you assume probabilities to be the truth"? Sorry again but "If you did say 100, then 1,1,0 group would just have ~5 people." is unclear too to me.


Less is more. Stay pure. Stay poor.
I have never done any of this before (causal discovery), but may play around with it tomorrow based on simulating the following data:


I am guessing it will be like creating a Bayesian network


Less is more. Stay pure. Stay poor.
@Kunal - I would be interested in what you found the answer to be. I am starting off by playing around with these data, SAS code at bottom ( I used SAS since I am just faster with it).

x1 | x2 = OR: 0.72 (95% CI: 0.66, 0.78)

x1 | x3 = OR: 0.44 (95% CI: 0.41, 0.48)

x2 | x3 = OR: 6.00 (95% CI: 5.48, 6.56)

So keep all possible edges for now


x2 | x3 controlling for x1; OR: 6.0 and 6.0

x3 | x1 controlling for x2; OR: 0.44 and 0.44

x1 | x2 controlling for x3; OR: 1.0 and 1.0

So remove x1 to x2 edge

So now we must figure out the edge directions and structure. Either a fork or inverted fork, but not a chain. I am guessing there is some type of test for Berkson's paradox to inform this decision. When using continuous variables, I think you can look at residuals or maybe the reset test to inform directionality. In the above, since x1| x2 are not independent this removes the inverted fork option

x1 --> x3 <-- x2, and then this leaves us with the fork below.

x1 <-- x3 --> x2
Basing my decision on the conditional independence between x1 and x2 given x3 and that x1 and x2 appear dependent when ignoring x3. I haven't done this before, so I could be wrong. Also, I am guessing there are assumption for this to hold true, like no miss variables (nodes) in data.

data df;
    input x1 x2 x3 count;
    0    0    0    1920
    0    0    1    1440
    0    1    0    480
    0    1    1    2160
    1    0    0    1920
    1    0    1    640
    1    1    0    480
    1    1    1    960
proc freq data=df;
    table x1*(x2 x3) / oddsratio;
    weight count;
proc freq data=df;
    table x2*x3 / oddsratio;
    weight count;

/*conditional independences*/
proc freq data=df;
    table x1*x2*x3 / oddsratio cmh;
    weight count;
proc freq data=df;
    table x2*x3*x1 / oddsratio cmh;
    weight count;
proc freq data=df;
    table x3*x1*x2 / oddsratio cmh;
    weight count;
Last edited: