Historical Milestones in Development of Computationally Useful Causal Concepts Our review and application of causal inference algorithms has deliberately emphasized principles and algorithms that have succeeded in competitive benchmarking tests, while skipping over centuries of previous work. As noted by Pearl (2014), “Traditional statisticians fear that, without extensive reading of Aristotle, Kant andHume, they are not well equipped to tackle the subject of causation, especially when it involves claims based on untested assumptions.” Even the relatively short history of computational approaches to causal analysis of data, which is only about a century old, can be intimidating. Some of its key milestones are as follows:
1920s: Path analysis was introduced and developed by geneticist Sewell Wright (1921). This was the first approach to use directed acyclic graph (DAG) models in conjunction with quantitative analysis of statistical dependencies and independencies to clarify the distinction between correlation and causality. They have been so used ever since. Although Wright’s path analysis was restricted to linear models, it can be seen as a forerunner of the Bayesian networks introduced some 70 years later, which generalize path coefficients to conditional probability tables (CPTs). These allow for non-parametric estimation of arbitrary (possibly non-linear) probabilistic dependencies among variables by specifying the conditional probabilities for the possible values of a variable, given each combination of values for the variables that point into it in a DAG model. In practice, this conditional probability distribution or table at a node of a DAG model can be represented relatively efficiently as a classification tree for the node’s value, given the values of its parents (inputs) in the DAG, rather than by explicitly listing all possible combinations of input values (Frey et al., 2003).Path analysis and closely related linear structural equations models (SEMs) were extensively developed by social scientists and statisticians in the 1960s and 1970s and became a primary tools of causal analysis in the social sciences in those decades (Blalock, 1964; Kenny, 1979).
1950s: Structural equations models (SEMs) were developed as tools for causal analysis. For example, polymath and Nobel Laureate Herbert Simon defined causal ordering of variables in systems of structural equations (Simon, 1953) and applied conditional independence and exogeneity criteria for distinguishing between direct and indirect effects and between causal and spurious correlations in econometrics and other fields (Simon, 1954).
1960s: Quasi-experiments were introduced, standard threats to valid causal inference in observational studies were identified and listed, and statistical designs and tests for overcoming them in observational studies were devised, most notably by social statisticians Donald T. Campbell and Julian C. Stanley (1963). These methods were extended and applied to evaluation of the success or failure of many social and educational interventions in the 1960s and 1970s, leading to a large body of techniques for program evaluation. The methods of data analysis and causal analysis developed for quasi-experiments, which consist largely of enumerating and refuting potential non-causal explanations for observed associations, have subsequently been extensively applied to “natural experiments” in which changes affect a subset of a population, allowing a quasi-experimental comparison of changes in responses in the affected subpopulation to contemporaneous changes in responses in the unaffected (control) subpopulation.
1965: Hill considerations for causality introduced. In 1965, Sir Austin Bradford Hill, doubting that any valid algorithmic approach for causal discovery could exist, introduced his “considerations” to help humans make judgments about causality based on associations (Hill, 1965). These considerations stand apart from much of the rest of the history of causal analysis methods, being neither greatly influenced by nor greatly influencing the technical developments that have led to successful current algorithms for causal discovery and inference. They have been enormously influential in encouraging efforts to use judgment to interpret associations causally in epidemiology and public health risk assessment, however. Some attempts have been made to link Hill’s considerations to counterfactual causality (Höfler, 2005), but they play no role in current causal analysis algorithms, and the rates of false positives and false negative causal conclusions reached with their help have not been quantified. As a psychological aid to help epidemiologists, risk assessors and regulators to make up their minds, Hill’s considerations have proved effective, but their performance as a guide for drawing factually correct conclusions about causality – especially manipulative causality – from observational data is less clear.
1970s: Conditional independence tests and predictive causality tests for time series were developed to identify predictive causal relationships between time series, most notably by Nobel Laureate econometrician Clive Granger and colleague Christopher Sims, building on earlier ideas by mathematician and electrical engineer Norbert Wiener (1956). Granger (or Granger-Sims) tests for predictive causality have been extended to multiple time series and applied and generalized by neuroscientists analyzing observations of neural firing patterns in the brain (Friston et al. 2013; Furgan and Siyal, 2016; Wibral et al., 2013).
1980s: Counterfactual and potential outcomes techniques were proposed for estimating average causal effects of treatments in populations, largely by statistician Donald B. Rubin and colleagues, building on work by statistician Jerzey Neyman in 1923. Over the course of four decades, specific computational methods put forward in this framework to quantify average causal effects in populations, usually by trying to use observations and assumptions to estimate what would have happened if treatments or exposures had been randomly assigned, have included matching on observed covariates (Rubin, 1974), Bayesian inference (Rubin, 1978), matching with propensity scores (Rosenbaum and Rubin, 1983), potential outcomes models with instrumental variables (Angrist et al., 1996), principal stratification (Zhang and Rubin, 2003), and mediation analysis (Rubin, 2004). These methods have been influential in epidemiology, where they have been presented as suitable for estimating average effects caused by treatments or interventions. But they have also been criticized within the causal analysis community as being needlessly obscure, reliant on untestable assumptions, and prone to give biased, misleading, and paradoxical results in practice, in part because they do not necessarily estimate genuine (manipulative) causal effect (e.g., Pearl, 2009). From this perspective, the useful contributions of the potential outcomes framework can be subsumed into and clarified by methods of structural equations modeling (ibid).
The 1980s also saw the introduction of classification and regression trees (CART) methods (Breiman et al., 1984). These would eventually provide nonparametric tests for conditional independence, useful for learning Bayesian network structures from data (Frey et al., 2003). They also provided the base nonparametric models for Random Forest ensembles and related non-parametric ensemble algorithms now widely used in machine learning (Furqan and Siyal, 2016).
1990s: Probabilistic graphical models were developed in great detail and given clear mathematical and conceptual foundations (Pearl, 1993). These included Bayesian networks and causal graph models, together with inference algorithms for learning them from data and for using them to draw causal inferences and to estimate the sizes of effects caused by interventions. These methods are most prominently associated with the Turing Award-winning work of computer scientist Judea Pearl and his coauthors. They grew out of the intersection of artificial intelligence and statistics. They provide a synthesis and generalization of many earlier methods, including structural equations modeling (both linear and nonlinear), probabilistic causation, manipulative causation, predictive (e.g., Granger) causation, counterfactual and potential outcomes models, and directed acyclic graph (DAG) models, including path analysis. Conditional independence tests and quantification of conditional probabilistic dependencies play key roles in this synthesis, as set forth in landmark books by Pearl (2000) and Koller and Friedman (2009). The full, careful development of probabilistic graphical models and algorithms created what appears to be a lasting revolution in representing, understanding, and reasoning about causality in a realistically uncertain world.
2000-Present: Causal discovery and inference algorithms for learning causal DAG models from data and for using them to draw causal inferences and to quantify or place bounds on the sizes of impacts caused by different interventions have been extensively developed, refined, tested, and compared over the past two decades. Important advances included clarifying which variables in a DAG model must and must not be conditioned on to obtain unbiased estimates of causal impacts in known DAG models (Textor, 2015; Shpitser and Pearl, 2008), as well as transport formulas for applying causal relationships discovered and quantified in one or more learning settings to a different target setting (Hernan and Vanderweele, 2011; Lee and Honavar, 2013; Bareinbaum and Pearl, 2013). Recent years have also seen substantial generalizations of earlier methods. For example, transfer entropy, a nonparametric generalization of Granger causality, quantifies the rates of directed information flows among time series variables. Introduced by physicist Thomas Schreiber (2000) and subsequently refined and extended by workers in computational finance and neuroscience (Wibral et al., 2013), transfer entropy and closely related methods appear to be promising for creating algorithms to discover causal DAG structures and quantitative dependency relationships and time lag characteristics from observations of multiple time series.
Even such an abridged list of milestones makes clear that causal analytics is now a large and deep field with a host of interrelated technical concepts and algorithms supported by a confluence of insights and methods from statistics, social statistics and program evaluation, electrical engineering, economics and econometrics, physics, computer science, computational finance, neuroscience, and other fields. Any brief survey must therefore be relatively superficial; full treatments run into thousands of pages (e.g., Koller and Friedman, 2009), and even documentation for R packages implementing the key ideas can be hundreds of pages.
This deep grounding of current information-based causal analytics methods and algorithms in nearly a century of computational methodsbacked by centuries of philosophizing about causality might well inspire a prudent humility (Pearl, 2014). Yet, for the practitioner with limited time and a need to draw sound causal inferences from data, two relatively recent developments make even superficial understanding of key ideas and software packages highly useful. The first is that many formerly distinct causal analysis methods have now been synthesized and unified within the framework of information-theoretic methods and directed acyclic graphs. This framework brings together ideas from potential outcomes and counterfactual causation, predictive causality, DAG modeling, and manipulative causality (Pearl, 2000; Pearl, 2010). The second is the success of the object-oriented software paradigm in platforms such as R and Python. Modern software enables and encourages encapsulation of technical implementation details so that only key ideas and behaviors of software objects need be understood to use them correctly. This allows users with only a superficial understanding of exactly what a software package does and how it works to use it appropriately to do valuable tasks. For example, a user who understands only that causes must be informative about their effects, and that this can be indicated graphically by arrows between variables showing which ones are identified as being informative about each other and which are conditionally independent of each other, can use this limited understanding to interpret correctly the results of sophisticated algorithms such as those in the CAT package. As a practical matter, making tools such as Bayesian network learning algorithms, classification trees, and partial dependency plots widely available and easy to apply can complement insights from regression-based and other associational and counterfactual methods to reveal and quantify potential causal relationships in observational data.
Conclusions This chapter has introduced several different concepts of causation and has discussed limitations and challenges for applying them in practice to describe how different factors affect risks; predict how alternative actions or changes in the controllable inputs to a system would affect outcome probabilities; optimize decisions to increase the probabilities of desired outcomes; and evaluate how well past actions or policies have succeeded in bringing about their intended goals. Table 2.7 summarizes the major concepts of causation discussed, and challenges for applying each one. Table 2.8 identifies some of the main communities using each concept (middle column) and techniques for implementing each concept using data analysis and modeling methods (right column). Major themes of the chapter are as follows:
Decision-makers need to understand manipulative causation to make well-informed decisions about how the choices they make affect probabilities of outcomes.
Manipulative causation is not implied by associational, attributive, or counterfactual causation. This creates a need and an opportunity for other methods to inform decision-makers about the probable consequences of alternative choices.
Manipulative causation is implied by mechanistic/explanatory causation and by structural causal models that show how the values of some variables are determined from the values of others via structural equations or simulation formulas representing causal mechanisms.
However, causal structures (e.g., causal graph or BN network topologies) and mechanisms for deriving the value or probability distribution of a variable from the values of the factors or variables on which it depends (e.g., via structural equations, CPTs or conditional probability models in a BN, or simulation steps in a system dynamics model or a discrete-event simulation model) are often initially unknown or uncertain for many systems and risks of interest. Algorithms and principles for discovering them from data (Table 2.4, right column) and for designing studies to produce such data are therefore of great practical interest.
Predictive causation can often be inferred from data using Granger causality tests and similar statistical methods and using BN learning tools and other machine-learning methods for causal graphs.
Predictive causation does not necessarily imply manipulative causation, as illustrated by the counter-example of nicotine-stained fingers being a predictive cause but not a manipulative cause of lung cancer.
Knowledge-based constraints, e.g., specifying that sex and age are sources in a causal graph and that death is a sink, can help orient arrows in a causal graph or BN so that they have valid manipulative interpretations. No fully automated procedure exists that is guaranteed to produce valid manipulative causal graph models from observational data. However, multiple causal discovery algorithms assisted by knowledge-based constraints provide a useful practical approximation to this ideal. Fully automated methods may produce some arrows (or, for some algorithms, undirected arcs) between variables that indicate only that they are informative about each other in a data set, and not necessarily that changing one would change the other.
Table 2.7 Summary of Causal Concepts and Challenges
Limitations and challenges
A cause makes its direct effects more probable
Direction of causation unclear: If P(X | Y) > P(X) then P(Y | X) > P(Y) (since P(X | Y)P(Y) = P(Y | X)P(X)).
Observing vs. doing: Seeing a high value of X can make seeing a high value of Y more likely even if increasing X reduces Y.
Stronger associations are more likely to be causal
Association is often model-dependent
Reducing a cause may not reduce its associated effects
Collider bias XZY
“Causal”is not dichotomous: Many paths
Some fraction of effect can be attributed to each cause based on relative risk ratios (associations)
Accounting: sum of attributed risk often exceeds 100%
Reducing a cause may not reduce effects attributed to it
Causes make the probability distributions of their effects different from what they otherwise would have been.
Effect size = estimated average difference in responses between real and counterfactual exposed populations.
What would have been is unobserved
Assumption-dependent estimates based on modeling assumptions are often wrong (e.g., Dublin coal burning ban example)
What effects would have been if exposure had been different depends on why it would have been different, which is seldom specified.
Causes help to predict their effects. Effects are not conditionally independent of their direct causes.
Confounding: Nicotine-stained fingers can be Granger causes of lung cancer (if smoking is not conditioned on), but cleaning fingers would not necessarily reduce risk of lung cancer
Changing cause changes effect (or its probability distribution)
How changing X would change Y cannot necessarily be predicted uniquely from observational data unless a valid causal model is available
Changing causes changes their effects via networks of law-like mechanisms
Values of effects (or their probability distributions) are derived from values of their direct causes
Mechanisms may be unknown
Pathways linking mechanism may be unknown
Direct causal parents may be unknown
Formulas, models, or CPTs for deriving the probabilities of effect variable values from the values of their direct causal parents may be unknown
Table 2.8 Summary of Key Users and Techniques for Different Causal Concepts
Conditional probability calculations
Regulators (e.g., EPA, OSHA, FDA, etc.)
World Health Organization, IARC, other public health authorities
Relative risk (RR) ratios
Epidemiological association metrics
Activists, regulators, litigators
World Health Organization, IARC, other public health authorities
Other epidemiological measures: Population attributable fraction, probability of causation, etiologic fraction, etc.
Policy analysts, especially those working on program evaluation
Response surface modeling, adaptive learning and optimization
Mechanistic/ explanatory causation
Economists and econometricians
Social science researchers
System dynamics modeling, continuous simulation (ODEs, Insight Maker)
System identification methods
Structural equation modeling (SEMs)
Simon-Iwasaki causal ordering
To handle model uncertainty, i.e., uncertainty about the correct description of the data-generating process or system underlying the observed data, machine learning algorithms such as Random Forest combine non-parametric estimation of conditional probability relations with the use of model ensembles that allow for the possibility that any of many models might provide the best description. Averaging predictions from many models in such an ensemble typically leads to better predictions (e.g., with lower false positive and false negative rates for classification tasks and smaller mean squared prediction errors for continuous predicted quantities) than any single model.
Bayesian networks (BNs) provide a useful unifying framework for many aspects of descriptive, predictive, prescriptive, and evaluation analytics. They also support learning from data and collaboration by experts in different parts of the BN.
Description: The network topology of a BN reveals multivariate patterns of dependencies and conditional independence among variables that are more informative than descriptive methods such as exploratory data analysis and visualization, clustering, or regression alone.
Prediction: A quantified BN model with all of its CPTs or other conditional probability models specified can be used to predict the values of some variables from observed or assumed values of others (“findings”) via conditional probability calculations, while handling missing data gracefully by only conditioning on what is observed. With stronger assumptions (e.g., linear models, Gaussian errors), BNs and related techniques such as SEM modeling and path analysis can be extended to allow hidden or latent variables; these in turn provide a way to deal with measurement or estimation errors in variables, since the true values can be regarded as latent variables for which the measured or estimated values are observable indicators. Dynamic Bayesian networks (DBNs) provide a way to use multiple observed interdependent time series to help forecast each other’s future values. A variety of other predictive models, such as Hidden Markov Models (HMMs) and Kalman filters for dynamic systems, can be expressed as DBNs.
Prescription and decision optimization: BN inference algorithms can be used to optimize decisions in influence diagrams (IDs).
Evaluation: If knowledge-based constraints are incorporated that allow the arrows in a BN to be interpreted as representing manipulative causation, then the causal BN can be used to answer evaluation questions about how much difference past policies or interventions have made in changing outcome probability distributions from what they otherwise would have been.
Learning: BN learning principles and algorithms such as those on the right side of Table 2.4 can be used to help learn BNs directly from data, although what can be learned is often only predictive causation. Knowledge-based constraints are typically needed to obtain arrows that have valid manipulative-causal interpretations.
Causal graph methods provide transport formulas for generalizing causal relationships discovered in one or more source data sets (e.g., by identifying invariant laws or CPTs that hold across settings) and applying them under novel conditions and to target environments not used to produce the training data. Related techniques allow combination and synthesis of causal modeling information from multiple observational and experimental data sets with overlapping variables (Triantafillou and Tsamardinos, 2015).
Although much remains to be done, and causal discovery and modeling algorithms are being actively developed by vibrant research communities, the very substantial accomplishments to date provide powerful methods that have proved their empirical value in neuroscience, financial economics and econometrics, control engineering, machine learning, and many applied areas.
The different concepts and methods of causal analytics summarized in Table 2.7 have attracted different followings, as suggested in the middle column of Table 2.8. Most modern concepts of causality are probabilistic; all of those in Table 2.8 agree that, in general, causes change the probability distributions of their effects. Deterministic relationships between changes in causes and changes in effects, as in ODE models, are special cases of more general probabilistic formulations in which conditional probabilities are 1 for one response and 0 for others. (However, attempts to understand causality entirely in terms of probability have been largely confined to philosophers (e.g., Suppes, 1970) and are today widely viewed as unsuccessful (Pearl, 2009).)
Associational, attributive, and counterfactual causation are widely used in epidemiology and public health. They provide numbers that can often be computed relatively easily from available data, e.g., using observed exposure prevalence numbers and relative risk ratios, or by using observed differences in response rates between a group with a defined exposure or intervention and a control group. The resulting numbers are commonly offered as answers to causal questions by epidemiologists, regulators, activists, litigants, and public health authorities, although they typically do not address manipulative causation. These numbers underlie many sensational headlines about “links” (usually meaning associations) between various exposures and adverse health effects; about pollution killing millions of people per year; about substances being determined by various authorities to be human carcinogens; or about bans of coal burning in Dublin saving thousands of lives. Such reports are widely used to support calls for action and policy recommendations. They are sometimes cited in court battles as evidence for or against probability of a plaintiff’s injury and in calculating probabilities of causation, and they have been used in worker compensation programs to attribute harm or shares in causation to specific causes.
However, associational, attributive, and counterfactual causation usually have no necessary real-world implications for how or whether taking different actions would affect (or has affected) outcome probabilities. Theoretical exceptions occur for associational and attributive causal methods if health effects can be shown to be related to exposures via a competing risk model; and for counterfactual causal methods if the counterfactual modeling assumptions can be shown to be correct. Such exceptions are rare in practice. It is usually the case that even the most sensational headlines based on associational, attributive, or counterfactual causation linking an exposure to adverse health effect do not imply that reducing or eliminating the exposure would reduce the adverse health effect – its frequency, probability, prevalence, incidence rate, or severity – in an exposed population. This is well understood by many specialists in epidemiology and statistics, but deserves to be much more widely understood by scientists, reporters, and the public. Communicating the limitations of associational, attributive, and counterfactual causal calculations is made more challenging by the understandable tendency of expert practitioners to emphasize their desirable features, such as clarifying the meaning of the causal questions being asked and allowing calculations of numbers that can be readily independently reproduced and verified by others starting from the same data and using the same methods. The advantages of rigor and objectivity are often touted without simultaneously emphasizing that the resulting numbers, causal conclusions, and risk and burden estimates do not mean what most non-specialists think they do: that changing the claimed causes of effects would change the claimed effects.
At the other end of the spectrum of causal concepts in Table 2.8 is mechanistic or explanatory causation. Mechanistic causal models describe and explain how initial changes in some variables propagate through network structures of causal mechanisms to bring about subsequent changes in other variables. This is the domain of the scientist and engineer: understanding how causes and effects are connected (structure) and how changes are transduced through complex systems. Quantitative descriptions and dynamic simulations based on networks or systems of equations expressing (theoretical or empirical) causal laws determining how some variables change when others change provide a tremendously powerful paradigm for description, explanation and diagnosis, prediction, what-if counterfactual analysis, and design optimization of systems. Structural causal models consisting of ODEs and algebraic equations (as well as PDEs and stochastic differential equations or stochastic process simulations for some systems) are included with explanatory causation in Table 2.8 because showing how the values of some variables are derived from the values of others – the central concept of structural equations – provides a way to describe the causal mechanisms linking them (Simon and Iwasaki, 2008). However, understanding how a complex system works in enough detail to create a valid simulation model or structural equation model describing its dynamic behavior in response to changes in exogenous inputs may require a great deal of knowledge, without which mechanistic causal modeling becomes impossible.
The two remaining causal concepts in Tables 2.7 and 2.8 are predictive causality and manipulative causality. As already discussed, manipulative causality is, or should be, of primary interest to decision makers and policy analysts. In practice, predictive causation is often a highly useful screen for manipulative causation, insofar as manipulative causation usually implies predictive causation, so that predictive causation is close to being a necessary, although not a sufficient, condition for manipulative causation. Moreover, as we have seen (e.g., Figure 2.27), incorporating mild knowledge-based constraints, such as that cold temperatures might affect mortality and morbidity but mortality and morbidity do not affect daily temperatures, into predictive causal discovery algorithms such as those in bnlearn often suffices to allow them to discover causal graph structures with arrows having manipulative as well as descriptive and predictive interpretations. Both predictive and manipulative causation are less demanding of detailed knowledge about the structure and functioning of a system and its components than mechanistic causation: knowing that changing one variable changes another (or its probability distribution), and by how much, requires less information than understanding how changes propagate from one to the other.However, both predictive and manipulative causation are usually more demanding that associational and attributive causation based on observed prevalences and relative risk ratios or on regression coefficients, and than counterfactual causation based on unvalidated modeling assumptions. Both predictive and manipulative causality require knowledge of the dependence and conditional independence relations among variables (e.g., as revealed by a causal graph structure) and CPTs or some other way to specify conditional probability relations, such as regression models or simulation steps, to quantify dependencies. This intermediate level of detail is where most practical work falls. Knowing the probability distribution for changes in outcomes caused by changing controllable inputs is all that is needed to support well informed decisions – but this information is needed. Attempts to bypass or simplify it by using more readily available information, such as statistical measures of association or attributable risk, do not provide decision makers with the essential information needed to identify decisions that make preferred outcomes more likely.
This chapter has presented causal concepts and analytics methods with an eye toward practical applications. It has emphasized important distinctions among alternative concepts of causation; surveyed their uses and limitations, especially in risk analysis applications; presented the main principles and ideas of algorithms that have proved useful for causal discovery, inference, and modeling; illustrated modern software implementing them; and discussed how they can be applied to support descriptive, predictive, prescriptive, and evaluation analytics tasks. The following chapters present a variety of applications and extensions of these ideas.
REFERENCES FOR CHAPTER 2 Andreassen S, Hovorka R, Benn J, Olesen KG, Carson ER (1991). A model-based approach to insulin adjustment. In Proc. of AIME’91, 239–248.
Aragam B, Gu J, Zhou, Q. (2017). Learning large-scale Bayesian networks with the sparsebn package. arXiv: 1703.04025. https://arxiv.org/abs/1703.04025. Last accessed 12-19-17.
Asghar N. (2016) Automatic Extraction of Causal Relations from Natural Language Texts: A Comprehensive Survey. https://arxiv.org/pdf/1605.07895.pdf. Last accessed 12-19-17.
Azzimonti L, Corani G, Zaffalon M (2017). Hierarchical Multinomial-Dirichlet model for the estimation of conditional probability tables. https://arxiv.org/abs/1708.06935. Last accessed 11-18-17.
Bareinboim E, Pearl J. Causal transportability with limited experiments. In Proceedings of the 27th AAAI Conference on Artificial Intelligence, pp. 95-101, 2013. ftp://ftp.cs.ucla.edu/pub/stat_ser/r408.pdf
Barnett L, Seth AK. (204) The MVGC Multivariate Granger Causality Toolbox: A new approach to Granger-causal inference. J. Neurosci. Methods 223: 50-68. Bearfield G, Marsh W. (2005) Generalising Event Trees Using Bayesian Networks with a Case Study of Train Derailment. In: Winther R, Gran BA, Dahll G (eds) Computer Safety, Reliability, and Security. SAFECOMP 2005. Lecture Notes in Computer Science, vol 3688. Springer, Berlin, Heidelberg Blalock HM. (1964) Causal Inferences in Nonexperimental Research. The University of North Carolina Press. Chapel Hill, North Carolina.
Bobbio A, Portinale L, Minichino M, Ciancamerla E. (2001). Improving the analysis of dependable systems by mapping fault trees into Bayesian networks. Reliability Engineering and System Safety 71: 249–260
Bontempi G, Flauder M. From dependency to causality: A machine learning approach. Journal of Machine Learning Research 16 (2015) 2437-2457
Boutilier C, Dearden R, Goldszmidt M. (1995) Exploiting structure in policy construction. Proceedings of the International Joint Conference on Artificial Intelligence, 14: 1104–1113.
Brewer LE, Wright JM, Rice G, Neas L, Teuschler L. Causal inference in cumulative risk assessment: The roles of directed acyclic graphs.Environ Int. 2017 May;102:30-41. doi: 10.1016/j.envint.2016.12.005.
Campbell DT, Stanley JC. 1963. Experimental and Quasi-Experimental Designs for Research. Houghton Mifflin Company. Boston, MA.
Charniak E. (1991) Bayesian networks without tears. AI Magazine 12(1): 50-63. https://www.aaai.org/ojs/index.php/aimagazine/article/download/918/836
Cole SR, Platt RW, Schisterman EF, Chu H, Westreich D, Richardson D, Poole C. (2010) Illustrating bias due to conditioning on a collider Int J Epidemiol. 2010 Apr; 39(2): 417-420.