Using BN Software for Probability Predictions and Inferences BN software products and methods allow the following standard approach to formulating and solving probabilistic inference problems for BNs with any number of nodes.
Create a BN consisting of a node for each random variable and a DAG (directed acyclic graph) in which arrows between variables represent dependencies between them.
Specify a marginal probability distribution for each input node.
Specify a CPT for each node with an arrow pointing into it.
Enter observations or assumptions (sometimes referred to generically as “findings”) about the values of some of the variables.
Obtain the conditional (posterior) distributions of all other variables, conditioned on the findings entered by the user. BN solver software packages automatically calculate these updated distributions.
The following example illustrates this process for the disease diagnosis example previously solved via Bayes’ rule, using the BN software package Netica (downloaded from www.norsys.com/netica.html) to perform the calculations.
Example: A Two-Node BN for Disease Diagnosis using Netica Setting: Suppose again that 1% of the population using a medical clinic have a disease and that a diagnostic test for the disease has these statistical performance characteristics:
P(test is positive | disease is present) = test sensitivity = 0.99.
P(test is negative | disease is not present) = test specificity = 0.98.
Problem: What is the probability that an individual has the disease, given a positive test result?
Solution via Netica Bayesian Network Solver: The Netica software can be downloaded for free from www.norsys.com/download.html. After running the installation package, double-click on Netica.exe in the folder to which it has been downloaded to open the software. When prompted, click on “Limited Mode” to run the free version for small problems. Under File, select New and then Network to open a new network:
Fig. 2.1a Creating a new Netica network
Starting with a blank network, select a “Nature Node” (yellow oval) from the Netica network drawing toolbar and then click on the blank area to create a node: it will appear as a rectangular box with a title at the top. By default, Netica will label this “Node A.” (“Nature Node” is Netica’s term for a random variable.) Click again to create a second node, which will automatically be labeled “Node B.” Join the nodes by an arrow using the “Add Link” arrow in the Netica toolbar. Double-clicking on each of these newly created nodes will let its name be edited and values or names for its possible states be entered.
Fig. 2.1b Adding nodes to the newly created network and linking them by an arrow
Dialogue boxes will open that let the user click on “New” to create new values for the “States” (possible values) of a node, and that also allow a new name for the node to be entered. Rename nodes A and B as “Disease_state” and “Test_result” and create and name two possible states for each: “Yes” for disease present and “No” for disease not present; and “Positive” for positive test result and “Negative” for negative test result, respectively. Clicking on “OK” in the dialogue box makes these changes, which henceforth will be displayed in the rectangle for each node. This completes the construction of the two-node DAG. The whole model can now be named and saved (using Save As under File); we will call the model “Disease.” To complete the model, it is necessary to fill in a probability table for each node. Double-clicking on a node opens a dialogue box that contains “Table” as a button, and clicking on this button opens another dialogue box into which the probability table for the node can be entered. For an input node such as Disease_state, this probability table is the marginal probability distribution of the states (or values) of the node. For a node with incoming arrows, such as Test_result, the probability table is the node’s CPT.
Fig. 2.1c DAG model drawn, with the node probability tables filled in, using Netica
When the probability tables have been specified, click on “OK” in each open dialogue box to make these changes and close the dialogue boxes. The data have now been entered in the DAG, but the DAG display will still be displaying default values – the marginal distribution for Disease_state (1.0% for Yes and 99.0% for No, and a uniform distribution such as 50.0% for Positive and 50.0% for Negative for Test_result, assuming that the “% Probability” display is being used. This is the default option in the dialogue box for the node. A decimal probability can be displayed instead if desired.) To start using the network to make calculations, select “Compile” under the “Network” drop-down menu on the main Netica toolbar. The nodes of the model will now start displaying correct probability values and corresponding bar charts. For example, if no findings are entered, then the probability for a Positive test result will be displayed as 2.97 percent.
Fig. 2.1d Complete Netica model, ready to use to draw inferences
This agrees with the manual calculation using predictive probability equation (2.4),
P(positive test) = P(positive test | disease)P(disease) + P(positive test | no disease)P(no disease) = 0.99*0.01 + 0.02*0.99 = 0.0297 = 2.97%. On the other hand, entering a finding that the patient has the disease will increase the probability of a positive test to 99%. The easiest way to do this in Netica is simply to click on “Yes” in the Disease_state node.
Fig. 2.1e Netica model with the finding “Disease_state = Yes” entered by clicking on “Yes” in the node for Disease_state.
Retracting this finding (by clicking on the blank spacein the Disease_state node or by right-clicking on the node and selecting “Unknown” under the “Finding” option that then appears on a menu) restores the model to its original state, ready to process another set of findings. Clicking on “Positive” in the Test_result node will then result in the posterior probability distribution for the Disease_state node being displayed. There is a 1/3 probability, displayed as 33.3 percent, that the patient has the disease, given a positive test result, as calculated previously via Bayes’ rule.
Fig. 2.1f Netica model with the finding “Test_result = Positive” entered by clicking on “Positive” in the node for Test_result.
This example has illustrated the steps of building a DAG model, populating its probability tables, and using it to draw probabilistic inferences in the simplest case of only two nodes (random variables), each with only two values. The same steps can be followed to build and use much larger BNs.
Example: Bayesian inference in a small BN – The family out problem Setting: The following example is based on Charniak (1991), which provides a nice introductory tutorial on BNs. Suppose that I am driving home and, as I approach my house, I see that the outdoor light is on, and I wonder whether my family is at home. Figure 2.2a shows a BN for what I know about the relationships among five variables, including the light being on. In the absence of other information, there is a prior probability of 15% that the family is out. If the family is out, then there is a 60% probability that the outdoor light will be on; otherwise, there is a 5% probability that the light is on. (Thus, the prior probability of the light being on is P(family out)*P(light on | family out) + P(family in)*P(light on | family in) = 0.15*0.60 + 0.85*0.05 = 0.1325.) The probability that I hear the dog bark is 70% if the dog is out and 1% if it is in. The dog may be out because it has a bowel problem, but this has a prior probability of only 1%. The probability that the dog is out is 99% if it has a bowel problem and the family is out; 90% if there is no bowel problem but the family is out; 97% if there is a bowel problem but the family is in; and 30% if there is no bowel problem and the family is in.
Fig. 2.2a A 5-node Bayesian network for the family out problem
Problem: If I observe that the light is on but I do not hear the dog bark, then what is the probability that the family is out?
Solution using Netica: Figure 2.2b shows the solution using Netica. The evidence of the light being on and no bark being heard has been entered as two findings (shaded nodes). The posterior probability that the family is out is 50.1%. Note that the probability that the dog is out has increased from 39.6% in the absence of evidence to 42.6% given the evidence of the light being on but no bark being heard. This shows that that evidence of the light being on increases the probability of the dog being out more than the evidence of no bark being heard reduces it.
Fig. 2.2b Propagation of evidence through the 5-node Bayesian network
These examples illustrate that BN software packages such as Netica make it relatively easy to assemble DAG models representing factored joint distributions of variables. Several popular packages are available for this purpose (e.g., Bayesia, BayesiaLab, Bayes Server™, HUGIN, GeNIe and SMILE; see www.kdnuggets.com/software/bayesian.html). To support quantitative inference, each node in a BN model has a corresponding probability table. For an input node (i.e., a node with no arrows directed into it), the probability table is the prior marginal probability distribution; for any other node, the probability table is its CPT. Once a BN model has been created by specifying a DAG structure and a probability tables for each node, it can be used to draw inferences and to answer questions for which the answers are expressed as posterior probabilities for the values of variables. Given a user-specified set of observed or assumed values (“findings”) for some variables, special algorithms calculate the posterior marginal distributions for the other variables, conditioned on the findings. These are the same distributions that would be obtained by explicitly storing the full joint probability table, as in Table 2.2, and then using it first to condition on the findings, by discarding rows that do not match them, and then to calculate the marginal distributions of each remaining variable by summing the probabilities for each of its values over all combinations of the other variables. However, the BN representation and calculations are far more efficient than this brute-force procedure.
Inference algorithms used by BN software to calculate posterior distributions conditioned on findings are sophisticated and mature (Koller and Friedman, 2009; Peyrard et al., 2015). They include both exact methods and Monte Carlo simulation-based methods for obtaining approximate solutions with limited computational effort. Exact inference algorithms (such as variable elimination, a type of dynamic programming) exploit the factorization of the joint distribution revealed by the DAG structure to calculate and store factors that are then used multiple times in calculating posterior probabilities for variables connected by paths in the DAG. Chapter 9 of Koller and Friedman (2009) presents variable elimination in detail. A simple class of algorithms for approximate inference is based on Gibbs sampling, a special case of Markov Chain Monte Carlo sampling. The main idea is that values of variables specified by “findings” (user-input assumptions or observations) are held fixed at the user-specified values. Values for other input variables are then randomly sampled from their marginal distributions. Values for other variables are successively sampled from the appropriate conditional distributions specified in their CPTs, given the values of their parents. Repeating many times yields samples drawn from the joint distribution of all the variables. As discussed in the example for Table 2.3, a sample size from the joint distribution that is very manageable with modern software (e.g., 100,000 samples) suffices to calculate approximate probabilities for answering queries to about two significant digits. Specialized sampling methods (such as likelihood-weighted sampling and related importance sampling methods) can be used to improve sampling efficiency for rare events. Thus, BNs provide effective methods for representing joint distributions and for using them to calculate posterior distributions conditioned on findings for a wide range of practical applications.
Both exact and approximate BN inference methods can also find the most probable values for unobserved variables (called their “maximum a posteriori” or MAP values) given the findings for observed (or assumed) values of evidence variables, with about the same computational effort needed to calculate posterior probabilities. This enables BN packages such as Netica to generate most probable explanations (MPEs) for findings. These are defined as values for the unobserved (non-evidence, non-finding) variables that maximize the probability of the findings. For example, in the family out example, the MPE for the findings in Figure 2.2b (light on, no bark heard) can be found by selecting the “Most Probable Expl” option under “Network” on the Netica toolbar. Netica displays the MPE by setting to 100% the probability bars for the most probable values of the non-evidence nodes (those with no findings entered). For the evidence in Figure 2.2b, the MPE is that the family is out, the dog has a bowel problem, and the dog is out.
Part 2 of Koller and Friedman (2009) presents details on inference algorithms for BNs, including computation of MAP/MPE explanations. For many practitioners, it suffices that available BN software includes well-engineered algorithms to quickly produce accurate answers for networks with dozens to hundreds of nodes. Examples of deployed applications include a BN with 671 nodes and 790 edges for automatic fault diagnosis in electrical power networks (Mengshoel et al. 2010) and a BN with 413 nodes and 602 edges for inference about individual metabolism and needed customized adjustments of insulin doses for individual diabetes patients (Andreassen et al., 1991; Tudor et al., 1998). Cossalter et al. (2011) discuss the challenges of visualizing and understanding networks with hundreds of nodes.
Practical Applications of Bayesian Networks The Bayesian network technology explained so far is mature, and BNs have been successfully applied in hundreds of practical applications in recent decades. The following examples illustrate their wide range of applications:
Dispute resolution and facilitation of collaborative risk management. BNs have been proposed to help importers and exporters agree on how to jointly manage the risks of regulated agricultural pests (e.g., fruit flies) along supply chains that cross national borders (Holt et al., 2017). A BN provides a shared modeling framework for discussing quantitative risks and effects of combinations of interventions at different points along the supply chain. Similarly, Wintle and Nicholson (2014) discuss the use of BNs in trade dispute resolution, where they can help to identify specific areas of disagreement about technical uncertainties.
Microbial risk assessment for food manufacturers (Rigaux et al., 2013). In this application, prior beliefs about microbial dynamics in a food production chain are summarized as a BN and measurements of microbial loads at different points in the production chain are used to update beliefs, revealing where unexpected values are observed and hence where prior beliefs may not accurately describe the real world. Such comparisons of model-predicted toobserved values, followed by Bayesian updating of beliefs and, if necessary, revision of the initial BN model, and keys to model validation and improvement light of experience.
Land use planning for hazardous facilities such as chemical plants, taking intoaccount the potential for domino effects, or cascading failures, during major industrial accidents (Khakzad and Reniers, 2015).
Monitoring of vital signs in homecare telemedicine (Maglogiannis et al., 2006)
Safety analysis ofhazards, such as underground buried pipelines, during river tunnel excavation (Zhang et al., 2016). In this and many other applications to risk, reliability, and safety analysis, practitioners often find it convenient to condition the variables in a BN on fuzzy descriptions of situations (e.g., “not very deeply buried”) when precise quantitative data are not available. BNs have also been developed for risk assessment and risk management of underground mining operations, bridge deterioration and maintenance, fire prevention and fire-fighting (for both wildfires and building fires), collision avoidance between ships in crowded ports and waterways, and safe operation of drones and autonomous vehicles.
Occupational safety in construction projects (Leu and Chang, 2013). BNs can identify site-specific safety risks and their underlying causes and probabilities, thus helping to prioritize costly interventions to reduce accident frequencies.
Reliability, fault diagnosis, and operations management. BNs have been applied to reliability analysis and related areas such as real-time fault diagnosis, local load optimization, and predictive maintenance of complex engineering infrastructures from electric power networks to high-speed trains to dams.
Cyber security, fraud detection, and counterterrorism. Detecting attacks, managing accounts with suspected but not proven fraudulent activity, and understanding contagion and systemic financial risks are increasingly popular areas for BNs.
Non-Causal Probabilities: Confounding and Selection Biases
Armed with basic probability theory and Bayesian networks (BNs) for manipulating joint, marginal, and conditional probabilities, it is easy to recognize both strengths and weaknesses of the idea that occurrence of a cause makes its effects more probable. Figure 2.3 shows the BN from Figure 2.2, about a disease and a test result, and illustrates the effects of two different findings.
Fig. 2.3 Changes in probability may reflect inference or causation (or both)
The top BN in Figure 2.3 shows the prior BN, before any findings are entered. The predictive probability of a positive test result is 2.97%. The middle BN shows that the presence of disease in an individual (indicated by Disease_state = Yes) increases the probability of a positive test result from 2.97% to 99%. This illustrates the intuition that a cause (presence of disease) makes its effect (positive test result) more probable. However, the bottom BN in Figure 2.3 illustrates a reason that “X causes Y” and “X increases the probability of Y” are not synonymous (where random variables X and Y are binary event indicators). In this BN, the finding of a positive test result increases the probability of disease from its unconditional prior of 1% to a posterior value of 33.3% based on (i.e., conditioned on) the finding of positive test result. Such examples make clear that attempted definitions of cause in terms of probabilities, such as “X is a cause of Y if P(Y | X) >P(Y), i.e., if the conditional probability of Ygiven that X has occurred is greater than the unconditional prior probability of X” are not adequate. Indeed, dividing both sides of equation (2.6) by P(x) yields the identity in equation (2.13), which implies that if finding that Y = y increases the probability that X = x (meaning that P(x | y)/P(x) > 1, or P(x | y) >P(x)) then, conversely, finding that X = x increases the probability that Y = y.
P(x | y)/P(x) = P(y | x)/P(y) (2.13)
The relationship “increases the probability of” between two events is seen to be symmetric, but the relationship “causes” between two events is generally taken to by asymmetric. Hence, one is not a good model for the other.
Other ways in which “increases the probability of” can fail to coincide with “causes” are important in applied sciences such as epidemiology. For example, the diverging DAG in (2.14) illustrates the concept of confounding.
X Z Y (2.14)
In such a model, seeing a high value of X may make a high value of Y more likely, not because X causes Y, but because Z causes both.
Example: Confounding – Effects of Common Causes In DAG model (2.14), suppose that high values of Z are associated with high values of both X and Y, and that low values of Z are associated with low values of both X and Y. Specifically, suppose that all three variables are binary (0-1) random variables representing the status of three different conditions in an individual, with 1 indicating presence and 0 indicating absence. Suppose that the marginal distribution for input Z is that it is equally likely to be 0 or 1 and that the CPTs for X and Y are specified by P(X = 1 | Z = 1) = 0.8; P(Y = 1 | Z = 1) = 0.7; and P(X = 0 | Z = 0) = P(Y = 0 | Z = 0) = 1. (The remaining CPT probabilities can be found by subtracting these from 1 where needed, e.g., P(Y = 0 | Z = 1) = 1 - P(Y = 1 | Z = 1) = 1 - 0.7 = 0.3.) Then the prior probability that Y = 1 is 35%:
P(Y = 1) = P(Z = 1)*P(Y = 1 | Z = 1) = 0.5*0.7 = 0.35.
On the other hand the conditional probability that Y = 1, given that X = 1, can be calculated with the help of Bayes’ rule (or by building the DAG in Netica, compiling it, and entering the finding X = 1):
P(Y = 1 | X = 1) =P(Z = 1 | X = 1)*P(Y = 1 | Z = 1) = P(X = 1 | Z = 1)*P(Z = 1)*P(Y = 1 | Z = 1) /(P(X = 1 | Z = 1)*P(Z = 1) + P(X = 1 | Z = 0)*P(Z = 0)) = 0.8*0.5*0.7/(0.8*0.5 + 0 + 0) = 0.70
Thus, seeing that X = 1 doubles the probability that Y = 1, from a prior value of 0.35 to a posterior value of 0.70. Yet, this is not because Xhas any causal effect on Y, but because seeing that X = 1 is diagnostic of Z probably being 1, and hence of Y probably being 1. This idea extends to ordered categorical and continuous random variables. For example, if high values of X, Y, and Z represent high exposure to a bad (e.g., contaminated or unclean) environment, poor health (or high mortality and morbidity rates), and low income (or high poverty), then one might expect to observe a positive association between bad environment and poor health even if neither causes the other, but both are caused by the confounder low_income. If only X and Y are observed variables, and the confounder Z is an unmeasured (also called a hidden or latent variable), then there might be no easy way to detect whether X Y or DAG model (2.14) is correct.