The marriage of a fault tree and an event tree is the bow tie diagram, illustrated schematically in Figure 2.10. The left side is a fault tree turned sideways, so that its top event, called the “Hazardous event” in Figure 2.10, is shown to the right of the events that cause it. The top event in many applications is an accident, loss of control of a system, or a system failure. Events within the fault tree are often described as failures of different preventive barriers, safeguards, or controls that could prevent the hazardous event if they functioned as intended. The basic events are referred to as triggers because they can trigger barrier failures. The right side of the bow-tie diagram is an event tree describing the event sequences that can be caused or triggered by the hazardous event. Again, these are often conceptualized as failures of barriers intended to mitigate the adverse consequences of the hazardous event. Depending on what happens, various consequences can result from the occurrence of the hazardous event.
Fig. 2.10. A schematic picture of a bow-tie diagram
Bow-tie diagrams can be used qualitatively to identify and review engineering designs and operations and maintenance policies and practices that create barriers to prevent occurrence or to mitigate consequences of a hazardous event. They can also be used quantitatively to calculate probabilities of different outcomes (the “consequences” on the right edge of Figure 2.10) and to study how they change if different barriers are added or removed. To these useful capabilities, reformulation of the bow-tie model as a BN adds several others (Khakzad et al., 2013), especially ability to consider the causes and consequences of multiple hazardous events simultaneously. The same barriers may help to prevent or mitigate multiple types of accidents. Understanding the risk-reducing benefits of expensive investments in defensive barriers often requires an event network with a topology more complex than a simple bow-tie, with some trigger events able to cause many hazardous events (e.g., an explosion leading to both fire and exposures of workers to chemical or radiological hazards). Bow-tie diagrams can be mapped to BNs, allowing more flexible inferences such as from observed occurrences of precursor events midway through the network to updated conditional probabilities of causes (trigger events) and outcomes or consequences.
Markov Chains and Hidden Markov Models When repair activities are included in a model of system reliability, fault trees and event trees are replaced by probabilistic dynamic models in which components undergo stochastic transitions from working to failed and back. If the transition intensities (expressed in units of expected transitions per unit time) are constant, then such failure-and-repair processes can be represented as Markov chains. Markov chain models can readily be represented by DBNs. A DBN quantifies CPTs for the conditional probabilities of values of variables in the current time slice, given their values in the previous time slice; this is precisely the information required for a Markov chain model. In notation, a Markov chain specifies P(Xt+1 = xt+1, | Xt = xt) where x denotes a vector of values (e.g., 1 = working or 0 = failed) for the random variable vector X, and t indexes time period. A DBN provides this same information, albeit through multiple CPTs instead of one large transition matrix (with on the order 22n transition rates, or probabilities per unit time, for a system with n binary components and a square transition matrix having 2n possible configurations of component states for its rows and columns). Fortunately, in most applications, not all variables depend on all other variables. This allows the full joint probability distribution of all variables in one period to be factored as a product of the probabilities of each variable conditioned only on the values of variables in its Markov blanket, which are revealed by the DBN DAG structure. The CPTs for each node suffice to compute the state transition probabilities P(Xt+1 = xt+1, | Xt = xt) without having to explicitly list the full transition probability matrix.
DBNs can also represent hidden Markov models (HMMs) in which the states are not directly observed but data or observations or signals, y, that depend probabilistically on the underlying states are available, P(yt, | xt). DBN inference algorithms can then be used to estimate the current unobserved state from the sequence of observations to date and prior probabilities for states (known as filtering in systems dynamics and control engineering) and to predict the probabilities of future states and observations from past and present observations (Ghahramani, 2001). Applications of HMMs include estimating current disease states from observed patient symptoms and histories and predicting probabilities of future failures or degraded performance of systems from performance measurement logs (Vrignat et al., 2015).
Probabilistic Boolean Networks A generalization of dynamic fault trees and event trees is the probabilistic Boolean network (PBN), in which each node represents a binary variable with possible values of 1 for on and 0 for off. The probability of each of these two possible states for a node in each period or time slice depends on its own value and on the values of its parents in the previous period. Node values make stochastic transitions between the states over time, creating a Markov chain. PBNs have been used to model gene regulatory networks. They can be represented as special cases of dynamic Bayesian networks (DBNs) (Lähdesmäki et al., 2006).
Time Series Forecasting Models and Predictive Causation A traditional (discrete-time univariate) time series model in which the probability distribution for a variable in each time period depends on a finite number of its own past values can be represented by a BN with nodes representing the values of the variable at different times and with arrows directed from earlier to later values of the variable. This idea can be extended to multiple time series: the arrows in a DBN show how the currently probability distribution for each variable depends on past (and, if appropriate, current) values of other variables. CPTs quantify these probabilistic dependencies. Popular models for analysis of multiple time series, including vector autoregression (VAR) models, can thus be represented as special cases of DBNs, with CPTs specified via regression models and error distributions. Forecasting future values on the basis of what has been observed so far can then be accomplished by applying BN inference algorithms to compute the conditional probability distributions of unobserved future values given the observed values seen to date. Missing data are handled naturally in this framework: as in other BN inference problems, once simply enters findings for observed values, and posterior probabilities are then computed for values of all unobserved ones, including any unobserved (i.e., missing) past and present values, as well as future values.
In a DBN for multiple time series variables, one time series variable can be defined as a predictive cause of another if and only if arrows run from past or present values of the former to present or future values of the latter. That is, X is a predictive cause of Y if and only if the value of Y in a time slice has a probability distribution that depends not only on past values of Y (and perhaps other variables), but also on past values of X (and perhaps the current value of X, if causation within time slices is allowed). If there are no such arrows, meaning that the value of Y at any time is conditionally independent of the past and present values of X, given the past and present values of Y itself (and perhaps other variables), then X is not identified as a predictive cause of Y. Intuitively, X is a predictive cause of Y if and only if the history of X up to the current moment can be used to predict future values of Y better than they can be predicted just using the history of Y (or, more generally, just using the histories of variables other than X) up to the current moment. This concept of predictive causation is often called Granger causation and was originally implemented by using statistical F tests to test whether data allowed rejection of the null hypothesis that mean squared prediction error for one time series, Y, is not significantly reduced by conditioning on the history of X as well as the history of Y itself (Granger, 1969). If such Granger causation held in both directions, so that X and Y each help to predict the other, this was usually taken as evidence that some third variable affected both. The basic idea was subsequently greatly generalized and given a non-parametric foundation in information theory by testing whether information flows from one time series variable to another over time, so that conditioning on the history of X reduces the expected conditional entropy (uncertainty) for future values of Y, even after conditioning on other observed variables. This information flow between time series variables is called transfer entropy (Schreiber, 2000). For the special case of traditional parametric time series with linear dynamics and Gaussian errors originally analyzed by Granger, transfer entropy specialized to Granger causality, i.e., information flows from X to Y if and only if X is a (Granger) predictive cause of Y. More generally, since information theory proves that any two random variables have positive mutual information if and only if they are not statistically independent of each other (Cover and Thomas, 2006), the arrows in a DBN representing the data-generating process for multiple time series provide a simple visual way to identify predictive causation: it is indicated by arrows in the DBN directed from past and present values of one time series variable to future values (and possibly present ones, if instantaneous causation within time slices is allowed) of another. Thus, DBNs provide a natural framework not only for time series modeling and forecasting, but also for analysis of predictive causation.
Example: Bivariate Granger Causality Testing Using the Causal Analytics Toolkit (CAT) Several on-line calculators are available to carry out Granger causality testing between two time series (e.g., www.wessa.net/rwasp_grangercausality.wasp), and it is also implemented in free software such as the granger.test function in the MSBVAR package in R. Rather than delving into of these software products, we will illustrate Granger causality testing using the cloud-based version of the Causal Analytics Toolkit (CAT), a free software package providing algorithms and reports for causal analysis and model building and statistics. CAT integrates many R packages and creates reports that do not require familiarity with R to generate or interpret; hence, it is useful for illustrating what can be done with R packages without requiring readers to learn R. The CAT software can be run from the cloud via the link http://cox-associates.com/CloudCAT. The calculations performed using CAT can also be performed with appropriate R packages by readers adept at R programming and the R package ecosystem.
Figure 2.11 shows the Data screen for CAT as of 4Q-2017. (Updates may be made.) A few data sets are bundled with the CAT software, under the “Samples” Data drop-down menu on the upper right. This example will use the LA air pollution, weather variable, and elderly mortality data set introduced in Chapter 1. Recall that, in this data set, AllCause75 gives daily mortality counts for people 75 years old or older in California’s South Coastal Air Quality Management District (SCAQMD), near Los Angeles.
Fig. 2.11 Loading data into the Causal Analytics Toolkit (CAT)
Source: CAT software at http://cox-associates.com/CloudCAT (User data files can be uploaded using the “Upload File” browser bar at the upper left. Subsets of columns can be selected for analysis if desired, but the default is to use all of them, and we will do so in this example.) We specify that the month and year variables, which are additional columns to the right of those shown in Figure 2.11, should be modeled as discrete variables, rather than as continuous, by checking them in the optional row near the middle of the Data screen. With these data-loading and preparation preliminaries accomplished, Granger causality testing can now be performed by clicking on “Granger” in the list of commands in the left margin. (Clicking on a menu icon causes this list of commands to appear when it is not showing.) Doing so generates the output in Figure 2.12. Like most CAT reports, this one is formatted as a sequence of increasingly detailed summaries and supporting analyses that can be scrolled through from top to bottom. The top part summarizes lists of significant Grange causes with 1-day lags, including the calculated p-values (based on F tests) for rejecting the null hypothesis that one variable does not help to predict another. The bottom table (most of which is not shown in Figure 2.12) provides supporting details by listing F statistics with p-values for each bivariate Granger test performed. In this data set, most pairs of variables help to predict each other (e.g., with a 1 day lag, AllCause75 is a predictor of month as well as month being a predictor of AllCause75), suggesting that most have common causes or other dependencies that are not well revealed by bivariate Granger causality testing. The slider at the top of the output allows similar results to be displayed for longer lags. Overall, these findings indicate a need for multivariate analysis to better distinguish which variables might directly cause which others: the Granger analyses show little more than that most variables are associated with each other over time, and hence are useful for predicting each other over time.
Fig. 2.12 Results of Granger causality testing in CAT
To overcome limitations such as the ones illustrated in this example, where most variables are correlated with time and hence with each other and appear to be Granger causes (significant independent predictors) of each other in bivariate tests, several groups have developed multivariate Granger causality tests and software (e.g., the FIAR package in R for stationary time series variables and the MVGC toolkit in MATLAB (Barnett and Seth, 2014)). However, the framework of Granger causality testing is somewhat fragile, insofar as it assumes specific parametric families of time series models (e.g., vector autoregression (VAR) models) and stationarity of the tested time series. The basic idea can be applied using more robust non-parametric methods, discussed later (e.g., Random Forest ensembles) or dynamic Bayesian networks to determine whether future values of a dependent variable are conditionally independent of the history of a hypothesized cause up to the present, given the histories of itself and other variables. If so, then the hypothesis of predictive causality between them is not supported.
The concept of predictive causation is attractive for several reasons. One is that it includes and refines the intuitive requirement that causes must precede their effects. Predictive causation requires not only this, but also that the history of causes up to a given moment must provide information about the future values of the effect. According to information theory, this implies that the causes help to predict the effect values in future periods, in the sense that conditioning on information about present and past values of causes reduces the expected conditional entropy of the probability distributions for at least some future values of the effect; more colloquially, knowing the causes is expected to reduce uncertainty about their future effects. Another attractive feature of predictive causation is that it is relatively objective, in that statistical tests are available for discovering whether data allow confident rejection, at a specified level of statistical confidence, of the null hypothesis that future values of one variable (the hypothesized effect) are conditionally independent of past and present values of another value (a hypothesized possible direct cause) given the values of other variables. (CART trees, discussed in the following example, provide one way to test this hypothesis, within the limitations imposed by the CART tree algorithm and the sample size and design.) Thus, different statisticians should be able to analyze the same data using the same software and reach the same conclusions about whether one variable can be confidently identified as a predictive cause of another.
Despite these advantages, however, predictive causation does not imply either manipulative causation or explanatory/mechanistic causation. A standard counter-example is that having nicotine-stained fingers might be a predictive cause of lung cancer in a data set that records both but that does not include smoking behavior: seeing nicotine-stained fingers might be a reliable indicator of increased future risks of lung cancer, providing predictively useful information not available from other variables in the data set. But it would not necessarily be a manipulative cause: keeping one’s fingers unstained would not necessarily decrease future risk of lung cancer unless the only way to have unstained fingers is not to smoke. Nor is it a mechanistic cause: changes in nicotine staining of fingers do not propagate through a set of mechanisms of to alter risk of lung cancer. In this example, the data set violates the Causal Markov condition (CMC), making it impossible for an algorithm to determine whether nicotine stained fingers are only a predictive cause, or also a manipulative cause or a mechanistic cause, of lung cancer risk.
Structural Equation Models (SEMs), Structural Causation, and Path Analysis Models Long before Bayesian network models and other probabilistic graphical models were introduced, econometricians and artificial intelligence researchers were already using structural equation models (SEMs) to model causal relationships and statistical dependencies among variables (Simon, 1953; Simon and Iwasaki, 1988). A structural equation shows how a dependent variable depends on other variables – namely, the variables that determine is value, represented by its parents in a directed graph model – via an equation such as
output = f(parent1, parent2, …, parent_n, error) (2.18)
Equation 2.18 signifies that the value of the dependent variable, here called output, depends via some (possibly unknown) function f on the values of other variables, namely its parents, and on a random variable, the error term. This error term represents the effects of all other determinants of the value of the dependent variable that are not otherwise represented in the model. It is usual to interpret each such structural equation as representing a causal mechanism, law, or constraint and to assume that its error term is independent of the error terms for other structural equations. It is important to test this assumption in practice, as correlated error terms may betray the presence of an unobserved common cause that should be included in the model as a latent variable, or of a selection bias that must be corrected for, before estimates of the functional relationship between its parents and the dependent variable can be interpreted causally.
The desired causal interpretation of a structural equation is usually the manipulative causal interpretation: that exogenously changing the value of a variable on the right side (i.e., a parent of the dependent variable) will cause the value of the dependent variable to change to restore equality between the left and right sides. In this case, the structural equation represents a causal model describing how the value of the dependent variable on the left is determined by the values of its parents on the right (Druzdzel and Simon, 1993). A system of such equations is called a structural equation model (SEM) if each function determining the value of a variable is invariant to changes in the forms of functions determining the values of other variables (Pearl, 2009). The intuition is that each equation describes how the values of its parents (including the random error term) determine the value of a variable, and that this should not depend on how other variables are determined if the equation describes a causal law or mechanism. Non-parents are considered to be ineligible to be direct causes of a variable.
The structure of an SEM model is given by the DAG showing the parents of each variable. If each equation of the form (2.18) in an SEM can be expressed by an equivalent CPT, as
P(output = y | parents = x)
where y is a value of the dependent variable (output) and x is a vector of values for its parents, and if the resulting DAG is acyclic, then the SEM is simply an alternative notation for a BN. Conversely, any BN with discrete random variables can be expressed as an equivalent SEM in which the value of each node is a deterministic function of the values of its parents and of a single independently uniformly distributed error term, which may be thought of as an unobserved (latent) variable that also affects its value (Druzdzel and Simon, 1993). The BN and SEM are equivalent in the sense that they represent the same joint probability distribution of the variables and have the same DAG. In a SEM with a known DAG showing which variables are derived from which others, X can be defined as a direct structural cause of Y if X appears in the equation determining Y, i.e., X is a parent of Y in the DAG. Similarly, X can be defined as an indirect structural cause of Y if X is an ancestor of Y, but not a direct parent. The intuition behind structural causality is that effects are derived from their causes, but causes are not derived from their effects; thus, a parent or ancestor can be a structural cause of it, but a descendent cannot. This interpretation is slightly different from the usual one in causal BN models, which states that probability distributions for the values of effects depend on the values of their direct causes. The concept of deriving the value for one variable (the effect) from the values of others (its direct causes, including the unobserved value of the error term) via a structural equation is consistent with, but distinct from, the idea that the value of the effect variable depends on (i.e., is not statistically independent) of the values of its direct causes.
SEMs with all linear equations and acylic DAGs are called path analysis models. These were the first types of causal DAG models studied, starting with work by geneticist Sewell Wright around 1920, and they occupied much of causal modeling in genetics, biology, and social sciences in the twentieth century (Pearl, 2009). Path analysis models generalize multiple linear regression by allowing for linear dependencies among explanatory variables. In a path analysis DAG, the arrows between standardized variables have weights called path coefficients indicating the effect of a unit change in the variable at an arrow’s tail on the variable into which it points, i.e., at its head. The total effect of a change in a variable X on a variable Y is calculated by multiplying coefficients along paths and summing over all paths from X to Y. BNs allow much greater flexibility in quantifying causal relationships between variables and their parents than path diagrams by replacing path coefficients with conditional probability tables (CPTs) that can represent arbitrary, possibly very nonlinear, effects and interactions among variables.