
What if we want to compute all marginals, not just one?

tarix  07.11.2018  ölçüsü  397,5 Kb.   #78794 

What if we want to compute all marginals, not just one? Doing variable elimination for each one in turn is inefficient Solution: Junction trees (a.k.a. join trees, clique trees)
In HMMs, we efficiently computed all marginals using dynamic programming In HMMs, we efficiently computed all marginals using dynamic programming If the graph is not a tree, reduce it to one by clustering variables
Moralize graph (if Bayes net) Moralize graph (if Bayes net) Remove arrows (if Bayes net) Triangulate graph Build clique graph Build junction tree Choose root Populate cliques Do belief propagation
A junction tree is a subgraph of the clique graph that: 1. Is a tree 2. Contains all the nodes of the clique graph 3. Satisfies the running intersection property. A junction tree is a subgraph of the clique graph that: 1. Is a tree 2. Contains all the nodes of the clique graph 3. Satisfies the running intersection property. Running intersection property: For each pair U, V of cliques with intersection S, all cliques on the path between U and V contain S.
Place each potential from the original network in a clique containing all the variables it references Place each potential from the original network in a clique containing all the variables it references For each clique node, form the product of the distributions in it (as in variable elimination).
Incorporate evidence Incorporate evidence Upward pass: Send messages toward root Downward pass: Send messages toward leaves
For each evidence variable, go to one table that includes that variable. For each evidence variable, go to one table that includes that variable. Set to 0 all entries in that table that disagree with the evidence.
For each leaf in the junction tree, send a message to its parent. The message is the marginal of its table, summing out any variable not in the separator. For each leaf in the junction tree, send a message to its parent. The message is the marginal of its table, summing out any variable not in the separator. When a parent receives a message from a child, it multiplies its table by the message table to obtain its new table. When a parent receives messages from all its children, it repeats the process (acts as a leaf). This process continues until the root receives messages from all its children.
Reverses upward pass, starting at the root. Reverses upward pass, starting at the root. The root sends a message to each of its children. More specifically, the root divides its current table by the message received from the child, marginalizes the resulting table to the separator, and sends the result to the child. Each child multiplies its table by its parent’s table and repeats the process (acts as a root) until leaves are reached. Table at each clique is joint marginal of its variables; sum out as needed. We’re done!
The junction tree algorithm is just a way to do variable elimination in all directions at once, storing intermediate results at each step. The junction tree algorithm is just a way to do variable elimination in all directions at once, storing intermediate results at each step.
To eliminate a variable at any step, we combine all remaining tables involving that variable. To eliminate a variable at any step, we combine all remaining tables involving that variable. A node in the junction tree corresponds to the variables in one of the tables created during variable elimination (the other variables required to remove a variable). An arc in the junction tree shows the flow of data in the elimination computation.
Avoids redundancy in repeated variable elimination Avoids redundancy in repeated variable elimination Need to build junction tree only once ever
Exponential in the treewidth of the graph Exponential in the treewidth of the graph  Treewidth can be O(number of nodes) in the worst case…
 These algorithms can be exponential in the problem size
 Could there be a better algorithm?
Can encode any 3SAT problem as a DGM Can encode any 3SAT problem as a DGM Use deterministic CPTs
Q’s are binary random variables C’s are (deterministic) clauses A’s are a chain of AND gates
#P complete #P complete To compute the normalizing constant we have to count the # of satisfying clauses.
Dostları ilə paylaş: 

