Ashcroft, M. (2013) Performing decisiontheoretic inference in Bayesian network ensemble models. In: Twelfth Scandinavian Conference on Artificial Intelligence. Jaeger M, NielsenTD, Viappiani P (Eds). 257:2534 Auer P, CesaBianchi N, Fischer P. (2002). Finitetime analysis of the multiarmed bandit problem. Machine Learning. 47 (2/3): 235–256. doi:10.1023/A:1013689704352
Averbeck BB. (2015) Theory of choice in bandit, information sampling and foraging tasks. PLoS Comput Biol. Mar 27;11(3):e1004164. doi: 10.1371/journal.pcbi.1004164.
Bala MV, Mauskopf JA. (2006) Optimal assignment of treatments to health states using a Markov decision model: an introduction to basic concepts.Pharmacoeconomics. 24(4):34554. Bareinboim E, Pearl J. Causal transportability with limited experiments. In Proceedings of the 27th AAAI Conference on Artificial Intelligence, pp. 95101, 2013. ftp://ftp.cs.ucla.edu/pub/stat_ser/r408.pdf
Beck JL, Zuev KM. (2017) Rare Event Simulation. In R. Ghanem, D. Higdon, H. Owhadi (Eds.) Handbook of Uncertainty Quantification. Springer, New York. https://arxiv.org/pdf/1508.05047.pdf
Bennett CC, Hauser K.Artificial intelligence framework for simulating clinical decisionmaking: a Markov decision process approach. Artif Intell Med. 2013 Jan;57(1):919. doi: 10.1016/j.artmed.2012.12.003.
Bertsekas DM and Shreve SE. (1996) Stochastic Optimal Control: The DiscreteTime Case. Athena Scientific. Belmont, MA. Bier VM and Cox LA Jr. (2017) Coping with uncertainty in adversarial risk models. Chapter in Abbas A, Tambe M, von Winterfeldt D. (Eds.) Improving Homeland Security Decisions. 2017. Cambridge University Press. New York, New York.
Box, GEP. (1957). Evolutionary Operation: A method for increasing industrial productivity. Journal of the Royal Statistical Society. Series C (Applied Statistics). 6 (2): 81 101. doi:10.2307/2985505. JSTOR 2985505.
Cami A, Wallstrom GL, Hogan WR. (2009). Measuring the effect of commuting on the performance of the Bayesian Aerosol Release Detector.BMC Med Inform Decis Mak. Nov 3;9 Suppl 1:S7.
Campbell DT, Stanley JC. 1963. Experimental and QuasiExperimental Designs for Research. Houghton Mifflin Company. Boston, MA.
Cao Q, Buskens E, Feenstra T, Jaarsma T, Hillege H, Postmus D. Continuoustime semiMarkov Models in health economic decision making: an illustrative example in heart failure disease management.Med Decis Making. 2016 Jan;36(1):5971. doi: 10.1177/0272989X15593080.
Cartwright N. Two theorems on invariance and causality. Philosophy of Science 70 2003 Jan 1: 203224. https://doi.org/10.1086/367876
Chakraborty M, Chua KYP, Das S, Juba B. Coordinated Versus Decentralized Exploration In MultiAgent MultiArmed Bandits (2017). Proceedings of the TwentySixth International Joint Conference on Artificial Intelligence (IJCAI). Melbourne, Australia. 164170. https://doi.org/10.24963/ijcai.2017/24
Choi E, Schuetz A, Stewart WF, Sun J. Using recurrent neural network models for early detection of heart failure onset.J Am Med Inform Assoc. 2017 Mar 1;24(2):361370. doi: 10.1093/jamia/ocw112. Clancy L, Goodman P, Sinclair H, Dockery DW. (2002) Effect of airpollution control on death rates in Dublin, Ireland: an intervention study.Lancet. Oct 19;360(9341):12104.
Cox LA Jr.(2008) What's wrong with risk matrices? Risk Analysis. Apr; 28(2):497512.
Dayer MJ, Jones S, Prendergast B, Baddour LM, Lockhart PB, Thornhill MH. Incidence of infective endocarditis in England, 200013: a secular trend, interrupted timeseries analysis.Lancet. 2015 Mar 28;385(9974):121928. doi: 10.1016/S01406736(14)620079
Dockery DW, Rich DQ, Goodman PG, Clancy L, OhmanStrickland P, George P, Kotlov T; HEI Health Review Committee. Effect of air pollution control on mortality and hospital admissions in Ireland. Res Rep Health Eff Inst. 2013 Jul;(176):3109.
Dorfman R. (1969) An economic interpretation of optimal control theory. American Economic Review. 59(5): 817=31
Doucet A, Johansen AM (2009). A Tutorial on Particle Filtering and Smoothing: Fifteen Years Later. In D. Crisan, B. Rozovsky (eds), The Oxford Handbook of Nonlinear Filtering (2009).
Dupac V. (1965) A Dynamic Stochastic Approximation Method. Ann. Math. Statist. 36(6):16951702.
Feng X, Shekhar A, Yang F, Hebner RE, Bauer P. (2017) Comparison of Hierarchical Control and Distributed Control for Microgrid. Electric Power Components and Systems 45(10). http://www.tandfonline.com/doi/full/10.1080/15325008.2017.1318982
Fu MC (2016). AlphaGo and Monte Carlo tree search: The simulation optimization perspective. Proceedings of the Winter Simulation Conference (WSC), 2016. 1114 Dec. 2016. Washington, DC, USA. IEEE . DOI: 10.1109/WSC.2016.7822130
Fu MC (Ed). 2015. Handbook of Simulation Optimization. Springer. New York. www.springer.com/us/book/9781493913831
Ganger, M., Duryea, E. and Hu, W. (2016) Double Sarsa and double expected Sarsa with shallow and deep learning. Journal of Data Analysis and Information Processing. 4: 159176. http://dx.doi.org/10.4236/jdaip.2016.44014
Gasparrini A, Gorini G, Barchielli A. On the relationship between smoking bans and incidence of acute myocardial infarction. Eur J Epidemiol. 2009; 24(10):597602.
Gilmour S, Degenhardt L, Hall W, Day C. Using intervention time series analyses to assess the effects of imperfectly identifiable natural events: a general method and example.BMC Med Res Methodol. 2006 Apr 3;6:16.PMID:16579864.
Goehler A, Geisler BP, Manne JM, Jahn B, ConradsFrank A, SchnellInderst P, Gazelle GS, Siebert U. Decisionanalytic models to simulate health outcomes and costs in heart failure: a systematic review. Pharmacoeconomics. 2011 Sep;29(9):75369.
Gómez, V, Thijssen, S, Symington, A, Hailes, S, Kappen, HJ (2016) RealTime Stochastic Optimal Control for Multiagent Quadrotor Systems. Proceedings of the 26th International Conference on Automated Planning and Scheduling (ICAPS'16) June 1217. London, UK. AAAI Press. https://arxiv.org/pdf/1502.04548.pdf
Grundmann O. (2014) The current state of bioterrorist attack surveillance and preparedness in the US. Risk Manag Healthc Policy. Oct. 9;7:17787.
HeinzeDeml C, Peters J, Meinshausen N. 2017. Invariant causal prediction for nonlinear models. https://arxiv.org/pdf/1706.08576.pdf
Ho YC, Chu KC. (1972) Team decision theory and information structures in optimal control problemsPart I. IEEE Transactions on Automatic Control Feb17(1): 1522.
Höfler M. Causal inference based on counterfactuals.BMC Med Res Methodol. 2005 Sep 13; 5:28.
Hoover, Kevin D. (2014) : Reductionism in Economics: Causality and Intentionality in the Microfoundations of Macroeconomics, CHOPE Working Paper, No. 201403 https://www.econstor.eu/bitstream/10419/149715/1/chopewp201403.pdf
Ilic MD, Liu S. (1996) Hierarchical Power Systems Control: Its Value in aChanging Industry. Springer Heidelberg.
James NA, Matteson DS. ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data. Journal of Statistical Software. December 2014, Volume 62, Issue 7
Kaelbling LP, Littman ML, Moore AW. Reinforcement learning: A survey. Journal of Artificial Intelligence Research archive 4(1) Jan 1996: 237285. www.cs.cmu.edu/afs/cs/project/jair/pub/volume4/kaelbling96ahtml/rlsurvey.html
Kalai A, Vempala S. (2005) Efficient algorithms for online decision problems. Journal of Computer and System Sciences 71: 291307 www.microsoft.com/enus/research/wpcontent/uploads/2016/11/2005Efficient_Algorithms_for_Online_Decision_Problems.pdf
Kale DC, Che Z, Bahadori MT, Li W, Liu Y, Wetzel R. Causal Phenotype Discovery via Deep Networks. AMIA Annu Symp Proc. 2015; 2015: 677–686.
Kamien KI, Schwartz NL. (2012) Dynamic Optimization, Second Edition: The Calculus of Variations and Optimal Control in Economics and Management. Dover Publications. Mineola, New York.
Katt S, Oliehoek FA, Amato C. (2017). Learning in POMDPs with Monte Carlo Tree Search. Proceedings of the 34 th International Conference on Machine Learning, Sydney, Australia, PMLR 70, 2017. http://proceedings.mlr.press/v70/katt17a/katt17a.pdf
Koller D, Parr R. (1999) Computing factored value functions for policies in structured MDPs. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI99). July 31August 6. Stockholm, Sweden. Morgan Kaufmann, San Francisco, CA.
Koller D, Milch B. (2003) Multiagent influence diagrams for representing and solving games. Games and Economic Behavior 45(1): 181221. https://ai.stanford.edu/~koller/Papers/Koller+Milch:GEB03.pdf
Kr´ol A, SaintPierre P. (2015) SemiMarkov: An R Package for Parametric Estimation in MultiState SemiMarkov Models. Journal of Statistical Software Aug 66(5). www.jstatsoft.org/article/view/v066i06
Lange K, Chi EC, Zhou H. (2014) A Brief Survey of Modern Optimization for Statisticians. Int Stat Rev. 2014 Apr 1;82(1):4670.
Lee S, Honavar V. (2013) mTransportability: Transportability of a causal effect from multiple environments. Proceedings of the TwentySeventh AAAI Conference on Artificial Intelligence. www.aaai.org/ocs/index.php/AAAI/AAAI13/paper/viewFile/6303/7210
Liao L and Ahn H. Combining deep learning and survival analysis for asset health management. International Journal of Prognostics and Health Management, ISSN21532648, 2016 020. www.phmsociety.org/sites/phmsociety.org/files/phm_submission/2016/ijphm_16_020.pdf
Luce RD, Raiffa H. Games and Decisions: Introduction and Critical Survey. John Wiley & Sons. New York. 1957. J.
Marschak J, Radner R (1972). Economic Theory of Teams. Cowles Foundation for Research in Economics at Yale University, Monograph 22. Yale University Press, New Haven and London.
Morgan MG, Henrion M. Chapter 10 of Uncertainty: A Guide to Dealing with Uncertainty in Quantitative Risk and Policy Analysis. Cambridge University Press, New York, 1990, reprinted in 1998. www.lumina.com/images/uploads/main_images/Analytica%20A%20Software%20Tool%20for%20Uncertainty%20Analysis%20and%20Model%20Communication.pdf
Myerson RB. (1991) Game Theory: Analysis of Conflict. Harvard University Press. Cambridge, MA.
Ortega PA, Braun DA. (2014) Generalized Thompson sampling for sequential decisionmaking and causal inference. Complex Adaptive Systems Modeling 2:2 https://doi.org/10.1186/2194320622
Osborne MJ. (2004). An Introduction to Game Theory. Oxford University Press.
Patsopoulos NA. A pragmatic view on pragmatic trials. Dialogues Clin Neurosci. 2011;13(2):21724.
Pepels T, Cazenave T, Winands MHM, Lanctot M (2014) Minimizing simple and cumulative regret in MonteCarlo tree search. In: Cazenave T, Winands MHM, Björnsson Y. (eds) Computer Games. CGW 2014. Communications in Computer and Information Science, vol 504. Springer, Cham. www.lamsade.dauphine.fr/~cazenave/papers/PepelsCGW2014.pdf
Peters J, Bühlmann P, Meinshausen N. (2016) Causal inference using invariant prediction: identification and confidence intervals. Journal of the Royal Statistical Society, Series B (with discussion) 78(5): 9471012 https://arxiv.org/abs/1501.01332
Ross GJ. Parametric and nonparametric sequential change detection in R: The cpm package. Journal of Statistical Software August 2015, Volume 66, Issue 3.
Ross S, Pineau J, Brahim C, Kreitmann P. (2011) Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes The Journal of Machine Learning Research. 12(2): 17291770.
Shachter RD, Bhattacharjya D. 2010. Solving influence diagrams: Exact algorithms. In Cochran J. et al. (Eds.) Wiley Encyclopedia of Operations Research and Management Science. John Wiley & Sons. New York. www.it.uu.se/edu/course/homepage/aism/st11/Shachter10.pdf
Shackleton M, Sødal S. (2010), Harvesting and recovery decisions under uncertainty. Journal of Economic Dynamics and Control. 34(12): 25332546, https://EconPapers.repec.org/RePEc:eee:dyncon:v:34:y:2010:i:12:p:25332546.
Shan G, Pineau J, Kaplow R. (2013).A survey of pointbased POMDP solvers.Autonomous Agents and MultiAgent SystemsJul 27(1):1–51. https://link.springer.com/article/10.1007/s1045801292002
Shen Y, Cooper GF. (2010). A new prior for Bayesian anomaly detection: application to biosurveillance.Methods Inf Med.;49(1):4453.
Shoham Y, LeytonBrown K. (2009) Multiagent Systems: ALgorithimic, GameTheoretic, and Logical Foundations. Cambridge University Press. www.masfoundations.org/download.html
Silver D, Veness J. (2010) MonteCarlo planning in large POMDPs. Advances in Neural Information Processing Systems 23 (NIPS)
Simon HA, Iwasaki Y. Causal ordering, comparative statics, and near decomposability. Journal of Econometrics 39 (1988) 149173. http://digitalcollections.library.cmu.edu/awweb/awarchive?type=file&item=34081
Simpson KN, Strassburger A, Jones WJ, Dietz B, Rajagopalan R. Comparison of Markov model and discreteevent simulation techniques for HIV.Pharmacoeconomics. 2009;27(2):15965. doi: 10.2165/0001905320092702000006.
Sutton RS, Barto AG, Williams RJ. (1992)Reinforcement learning is direct adaptive control. IEEE Control Systems.Apr 12(2): 1922. www.ieeecss.org/CSM/library/1992/april1992/w01ReinforcementLearning.pdf. Last accessed 91017
Taghipour S, Caudrelier LN, Miller AB, Harvey B. Using Simulation to Model and Validate Invasive Breast Cancer Progression in Women in the Study and Control Groups of the Canadian National Breast Screening Studies I and II.Med Decis Making. 2017 Feb;37(2):212223. doi: 10.1177/0272989X16660711
Thomas LC. (2003) Games, Theory and Applications. Dover Publications. Mineola, New York.
Tolpin D., Shimony, S. (2012). MCTS based on simple regret. In: Proc. Assoc. Adv. Artif. Intell. pp. 570–576. www.aaai.org/ocs/index.php/AAAI/AAAI12/paper/viewFile/4798/5240
Tsoukalas A, Albertson T, Tagkopoulos I. From data to optimal decision making: a datadriven, probabilistic machine learning approach to decision support for patients with sepsis.JMIR Med Inform. 2015 Feb 24;3(1):e11. doi: 10.2196/medinform.3445.
Van Seijen H, Van Hasselt H, Whiteson S, Wiering M. (2009) A theoretical and empirical analysis of expected Sarsa. 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, Nashville, 30 March2 April 2009, 177184. http://dx.doi.org/10.1109/ADPRL.2009.4927542
Wilson RB (1968). The theory of syndicates. Econometrica. Jan 36(1): 119132.
Yüksel S, Saldi N. (2017) Convex Analysis in Decentralized Stochastic Control, Strategic Measures, and Optimal Solutions. SIAM Journal on Control and Optimization. 55(1): 128 Serdar and Naci Saldi
White H, Sabarwei S. (2014) QuasiExperimental Design and Methods. UNICEF Office of Research. Methodological Briefs Impact Evaluation No. 8. UNICEF Office of Research  Innocenti Piazza SS. Annunziata, 12 50122 Florence, Italy. https://www.unicefirc.org/publications/pdf/brief_8_quasiexperimental%20design_eng.pdf
Zarchan P, Musoff H. (2015)Fundamentals of Kalman Filtering: A Practical Approach, Fourth Edition.American Institute of Aeronautics and Astronautics
Zhengping Che, Sanjay Purushotham, Robinder Khemani, Yan Liu. Interpretable Deep Models for ICU Outcome Prediction. AMIA Annu Symp Proc. 2016; 2016: 371–380.
Zigler CM, Dominici F. Point: clarifying policy evidence with potentialoutcomes thinkingbeyond exposureresponse estimation in air pollution epidemiology. Am J Epidemiol. 2014 Dec 15;180(12):113340.
Chapter 2
Causal Concepts, Principles, and Algorithms
It is an important truism that association is not causation. For example, people living in lowincome areas may have higher levels of exposure to an environmental hazard and also higher levels of some adverse health effect than people living in wealthier areas. Yet this observed association, no matter how strong, consistent, statistically significant, biologically plausible, and well documented by multiple independent teams, does not necessarily tell a policy maker anything about whether or by how much a proposed costly reduction in exposure would reduce adverse health effects. Perhaps only increasing income, or something that income can buy, would reduce adverse health effects. Or maybe factors that cannot be changed by policy interventions increase both the probability of living in lowincome areas and the probability of adverse health effects. Whatever the truth is about opportunities to improve health by changing policy variables, it typically cannot be determined by studying correlations, regression coefficients, relative risks, or other measures of association between exposures and health effects (Pearl, 2009). Observed associations between variables can contain both causal and noncausal (“spurious”) components. In general, the effects of policy changes on outcomes of interest can only be predicted and evaluated correctly by modeling the network of causal relationships by which effects of exogenous changes propagate among variables. The chapter reviews current causal concepts, principles, and algorithms for carrying out such causal modeling and compares them to other approaches.
Many different concepts of causality were proposed in the twentieth century and earlier by philosophers (Suppes, 1970; Hausman and Woodward, 1999), geneticists (Wright, 1921), statisticians and social statisticians (Neyman, 1923; Campbell and Stanley, 1963; Blalock, 1964; Rubin, 1974); epidemiologists (Robins and Greenland, 1992), mathematicians and physicists (Wiener, 1956; Schreiber, 2000), economists and econometricians (Simon,1953; Granger, 1969), artificial intelligence and machine learning researchers, and computer scientists (Charniak,1991; Druzdzel and Simon, 1993). They expressed, with varying degrees of rigor and precision, intuitions such as that effects regularly and predictably follow their causes; that causes make their effects different from what they otherwise would have been; that causes are informative about and help to predict their effects; that expected values or probability distributions for effect sizes can be determined from the values of their causes; and that changing causes changes the probabilities of their effects. By the year 2000, these strands of thought on how to define, measure, and estimate causal relationships and effects had largely been unified in a framework that emphasizes the use of diagrams with nodes representing variables and arrows between nodes representing causal dependencies (Pearl, 2009). This framework includes the popular “directed acyclic graph” (DAG) models introduced in Chapter 1, as well as more general models with cycles and undirected arcs (representing dependency with unknown causal direction) allowed. We shall use the DAG models in the following sections.
This chapter explores what it means to say that one thing causes another and reviews key ideas about causality that have proved useful in interpreting a broad variety of data and estimating causal impacts of interventions on outcomes. It discusses how to represent different types of causal knowledge using diagrams and mathematical, statistical, and computational models to facilitate explanation, communication, and computation of causal inferences. Finally, this chapter surveys principles and algorithms for using causal models to answer practical questions requiring causal inference. These include questions of attribution and diagnosis, prediction and prognosis, explanation, prescriptive optimization of decisions, and evaluation of their impacts.
Learning goals for this chapter are as follows:

Distinguish between (a) statistical associations, inferences, and models; and (b) causal models to support/evaluate/improve policy decisions

Introduce, explain, and show how to apply several different types or concepts of causality to improve predictions, decisions, and learning. The main types discussed in this chapter are associational, attributive, counterfactual, structural (computational), predictive, manipulative, and mechanistic or explanatory causation.

Explain the main concepts and software tools currently available to solve causal analytics problems. These include techniques for identifying causal network models from data and for using them to predict, infer, attribute, and explain effects based on observations; optimize decisions; and quantify partial (“direct” and “indirect”) and total causal relationships.

Introduce algorithms and principles for identifying approximately correct causal models from data using relatively objective (assumptionfree, investigatorindependent) machinelearning methods where possible, together with knowledgebased constraints where necessary (e.g., that effects do not precede their causes, or that weather can be a cause but not an effect of illnesses).

Illustrate how to use freely available software for applying causal analytics methods and specific causal discovery and inference algorithms to data. Air pollution health effects research is used as an example for illustrating stateoftheart causal analytics algorithms.
The chapter is relatively long and introduces many technical concepts and terms needed to take advantage of current causal analytics methods and software. By the end of this chapter, the reader will be conversant with the main ideas and methods of modern causal analytics and will understand their potential and limitations for practical applications in risk analysis. To minimize the burden on readers who are mainly interested in applications, subsequent chapters briefly recapitulate key concepts and techniques where they are used, leaving a fuller exposition of concepts and methods to this chapter. On the other hand, for readers who wish to delve further into the technical methods surveyed in this chapter, an extensive and uptodate list of references gives access to the primary research literature and to several outstanding surveys, tutorials, and software packages. As in so much of the current practice of data science, the exposition here is targeted mainly at readers who seek to understand technical concepts and methods well enough to use them correctly and effectively and to provide a relatively accessible point of entry to the large and exciting recent technical and research literatures that are transforming how arrtifical intelligence, machine learning, and data science are being used to learn about cause and effect and to improve understanding and control of the behaviors of a broad range of uncertain systems that affect human health and wellbeing.
Multiple Meanings of “Cause”
The claim that one event or condition causes another has meant different things to different people and organizations. Modern causal analysis clarifies these different meanings, allowing more precise expression of what questions a causal study addresses and how the answers should be interpreted. For example, in public and occupational health risk analysis, the causal claim “Each extra unit of exposure to substance X increases rates of an adverse health effect (e.g., lung cancer, heart attack deaths, asthma attacks, etc.) among exposed people by R additional expected cases per personyear” can be interpreted in at least the following ways:

Probabilistic causation (Suppes,1970): The conditional probability of the health response or effect occurring in a given interval of time is greater among individuals with more exposure compared to otherwise similarseeming individuals with less exposure; in this sense, probability of response (or agespecific hazard rate for occurrence of response) increases with exposure. On average, there are R extra cases per personyear per unit of exposure. The main intuition is that causes (exposures) make their effects (responses) more likely to occur within a given time interval, or increase their occurrence rates.

Associational causation (IARC, 2006): Higher levels of exposure have been observed in conjunction with higher risks, and this association is judged to be strong, consistent across multiple studies and locations, and biologically plausible. The slope of a regression line between these historical observations in the exposed population of interest is R extra cases per personyear per unit of exposure. The main intuition is that causes are associated with their effects. Relative risk (RR) ratios – the ratios of responses per person per year in exposed compared to unexposed populations – and quantities derived from RR, such as burdenofdisease metrics, population attributable fractions, probability of causation formulas, and closely related metrics, are widely used in epidemiology and public health to quantify associational causation.

Attributive causation (Murray and Lopez, 2013): Authorities attribute R extra cases per personyear per unit of exposure to X; equivalently, they blame exposure to X for R extra cases per personyear per unit of exposure. In practice, such attributions are usually made based on measures of association such as the ratio or difference of estimated risks between populations with higher and lower levels of exposure. Differences in risks between the populations are attributed to their differences in exposures without further analysis of other possible explanations. The main idea is that if people with higher exposures have higher risks for any reason, then the increased risk can be attributed to the higher exposure. (If many risk factors differ between lowrisk and highrisk groups, then the difference in risks can be attributed to each of them separately; there is no consistency constraint preventing multiples of the total difference in risks from being attributed to the various factors.)

Counterfactual and potential outcomes causation (Höfler, 2005; Glass et al., 2013; Lok, 2017; Li et al., 2017): In a hypothetical world (or maybe in all conceivable counterfactual worlds) with 1 unit less of exposure to X, expected cases per personyear in the exposed population would also be less by R. Usually, such counterfactual numbers are derived from modeling assumptions, and explanations for the counterfactual reduction in exposure are not discussed. The main intuition is that differences in causes make their effects different from what they otherwise would have been.

Predictive causation (Granger, 1969; Kleinberg and Hripcsak, 2011; Papana et al., 2017): In the absence of interventions, time series data show that the observation that exposure has increased or decreased is predictably followed, perhaps after a lag, by the observation that average cases per personyear have also increased or decreased, respectively, by an average of R cases per unit of change in exposure. The main intuition is that causes help to predict their effects, and changes in causes help to predict changes in their effects. More generally, causes are informative about their effects, so effects can be predicted better with information about their causes than without it.

Structural causation (Simon, 1953; Simon and Iwasaki, 1988; Hoover, 2012): In a valid mathematical or computational simulation model (or possibly in all valid simulation models), the number of cases per personyear is derived at least in part from the value of exposure. Thus, the value of exposure must be determined before the value of yearly case count can be determined. Moreover, the average calculated or simulated value of the case count per personyear decreases by R for each exogenously specified unit decrease in exposure. The main intuition is that effects depend on, and are calculated from, their causes.

Manipulative causation (Voortman et al., 2010; Hoover, 2012; Simon and Iwasaki, 1988): Reducing exposure by one unit reduces expected cases per personyear by R. The main intuition is that changing causes changes their effects.

Explanatory/mechanistic causation (Menzies, 2012; Simon and Iwasaki, 1988): Increasing exposure by one unit causes changes to propagate through a biological network of causal mechanisms. When all changes have finished propagating, the new expected value for case count per personyear in the exposed population will be R more than before exposure was increased. The main intuition is that changes in causes propagate through a network of lawlike causal mechanisms to produce changes in their effects. Causal mechanisms are usually represented mathematically by structural equations or by conditional probability tables (CPTs) that are invariant across settings, as discussed in Chapter 1 (Pearl, 2009).
For risk managers and policy makers, manipulative causation is key, since the goal of decisionmaking is to choose acts that cause desired outcomes, in the sense of making them more probable. Manipulative causation is implied by mechanistic causation – if there is a network of mechanisms by which acts change the probabilities of outcomes (mechanistic causation), then taking the acts will indeed change the probabilities of outcomes (manipulative causation). But neither one is implied by associational, attributive, counterfactual, or predictive concepts of causation (Pearl, 2009). Understanding and appropriately applying these distinctions among concepts of causation, and making sure that associational concepts are not misrepresented or misunderstood as manipulative causal ones in policy deliberations and supporting epidemiological analyses, provides a crucial first step toward improving current practice in epidemiology (Petitti, 1991).
The following sections examine these different concepts of causality more closely and discuss how they are related. Probabilistic causal models, which are common to all of these concepts of causation, are emphasized. In particular, we explain how Bayesian network (BN) models can be used to represent probabilistic dependencies among variables, manipulate probabilities to make predictions, and draw probabilistic inferences. They also provide a useful unifying framework and generalization of many wellknown probabilistic risk assessment (PRA) and decision analysis techniques. Modern software makes it relatively easy to build and use BNs. Several examples show how to use current BN software to create simple BN models and use them to draw inferences and make predictions. BN algorithms can also be extended to networks with decisions, i.e., influence diagrams (IDs), and used prescriptively to solve for optimal statistical decisions; additional examples illustrate these methods. The final sections of the chapter consider how to learn causal models from data and conclude with a brief description of selected milestones in the historical development of modern causal analysis.
Probabilistic Causation and Bayesian Networks (BN)
Perhaps the simplest intuition relating probability and causation is that causes make their effects more probable. To sharpen this intuition and use it to draw quantitative inferences, it is necessary to be more explicit about how one observation, action, or event can make another more probable. The assumed technical background for this discussion is elementary probability theory, especially the concept of a random variable and the definitions and notations for joint, marginal, and conditional probability distributions.
Technical Background: Probability Concepts, Notation and Bayes’ Rule
Uncertain quantities in this chapter are represented by random variables. Most of this chapter assumes that the random variables in question are discrete. The notation P(x) will be used as an abbreviation for the probability that random variable X has specific value x. Thus, P(x) is a shorthand for P(X = x), or, as it is sometimes more explicitly denoted, P_{X}(x), where the subscript shows the particular random variable for which probabilities of values are being given. P(x) is often called the probability mass function, or, for continuous random variables, the probability density function of the random variable X. When X is just one of several random variables being considered, P(x) is also called its marginal distribution. In such a multivariate context, where the particular random variable being referred to might be unclear, the notation P_{X}(x) for the marginal distribution of X can preserve clarity. We will use the simpler P(x) when the random variable being referred to is clear from context. Likewise, P(x, y) will denote the joint probability that random variable X has specific value x and that random variable Y has specific value y; thus, P(x, y) is short for P(X = x, Y = y) and for the more explicit notation P_{X,Y}(x, y) for the joint probability that X = x and Y = y. The conditional probability that X = x, given that Y = y, will usually be written as P(x  y) in preference to the longer and more explicit notations P(X = x  Y = y) or P_{XY}(x  y). Recall the definition of conditional probability:
P(x  y) = P(x, y)/P(y) (2.1)
when the denominator is greater than zero. This definition follows by rearranging the identity
P(x, y) = P(y)P(x  y) (2.2)
i.e., the probability that both X = x and Y = y is the probability that Y = y times the conditional probability that X = x given that Y = y. With equal validity, the joint probability P(x, y) can be factored as a product of a marginal and a conditional probability in a different way, as follows:
P(x, y) = P(x)P(y  x) (2.3)
The marginal distribution for a random variable X can always be calculated from its conditional probabilities, given each of the values of one or more other variables, and from the marginal probabilities of those values, via the law of total probability. This states that the total probability of an event (such as that X has specific value x) is the sum of the probabilities of all of the ways in which it can occur in conjunction with each of a set of mutually exclusive, collectively exhaustive events (such as that Y has each of its possible specific values).
Applying the law of total probability to two random variables X and Y to obtain the marginal distribution of Y from the marginal distribution of X and the conditional probability distributions of Y given each value of X yields the following prediction formula for Y values::
P(y) = _{x}P(y  x)P(x) (2.4)
Here, the sum is taken over each of the distinct possible values, x, of X; if X is a continuous random variable, then the sum must be replaced by an integral. Equating the righthand sides of equations (2.2) and (2.3), since they both equal P(x, y), yields
P(y)P(x  y) = P(x)P(y  x) (2.5)
Dividing both sides by P(y) (assuming it is nonzero) gives the identity
P(x  y) = P(x)P(y  x)/P(y) (2.6)
Then, expanding P(y) via the law of total probability (2.4), yields Bayes’ Rule:
P(x  y) = P(x)P(y  x)/_{x’}P(y  x’)P(x’) (2.7)
(The primes on the x values in the denominator are inserted to make clear that x’ is simply an index for the values of X being summed over, not to be confused with the specific, fixed value x in the numerator and on the left side of the equation.) P(x) is called the prior probability that X = x, and P(x  y) is called the posterior probability that X = x, given the observation or data that Y = y. Bayes’ Rule allows data on the marginal probabilities of X values and on the conditional probabilities of Y values given X values to be used to infer conditional probabilities of X values given Y values. We assume familiarity with these aspects of probability theory throughout the remainder of this chapter.
Example: Joint, Marginal, and Conditional Probabilities for Answering Queries
Table 2.1 shows the 9 joint probabilities for all possible combinations (i.e., pairs) of values for two discrete random variables, each with three possible values: X with possible values 1, 2, and 3; and Y with possible values 4, 8, and 16. Such a joint probability table can be used to answer any question about the probabilities that the values of X and Y fall in specified sets or satisfy specified constraints.
Table 2.1 A joint probability distribution for two random variables: X with possible values 1, 2, and 3; and Y with possible values 4, 8, and 16. For example, P(x, y) = (3,16) = 0.3.


X values

Y values

1

2

3

4

0.0

0.25

0.15

8

0.05

0.15

0.0

16

0.1

0.0

0.3

Dostları ilə paylaş: 