Causal Analytics for Applied Risk Analysis
Louis Anthony Cox, Jr.
Douglas A. Popken
Richard X. Sun
To
Christine and Emeline
PREFACE
Individual, group, organizational, and public policy decisions are often disconcertingly ineffective. They produce unintended and unwanted consequences, or fail to produce intended ones, even after large expenditures of hope, time, and resources. The difficulty of achieving unambiguous successes, in which costly actions or policies produce large, clear net benefits that even those who initially doubted find compelling after the fact, has been noted in areas as varied as personal financial decisions, corporate business decisions, engineering infrastructure decisions, nonprofit initiatives for poverty disruption or delinquency prevention, public health efforts to curb the emergence of antibioticresistant superbugs, and regulation of pollutants to improve public and occupational health.
This book is about how to make more effective decisions – that is, decisions that are more likely to cause preferred outcomes and to avoid undesirable ones – by understanding and fixing what so often goes wrong. We believe that the most common reason for disappointing results from wellintended policies and actions is inadequate understanding of the causal relationships between actions and probabilities of outcomes. Actions guided by traditional statistical analyses of association patterns in observational data, such as regression modeling or epidemiological estimates of relative risk ratios, usually cannot be relied on to achieve their objectives because these traditional methods of analysis are usually not adequate for determining how changing some variables will change others. But that is what decisionmakers must know to make wellinformed choices about what changes to implement. This book is therefore devoted to causal analytics methods that can provide answers to the crucial causal question of how changing decision variables – the things that a decisionmaker or policymaker can control or choose – changes probabilities of various outcomes. It presents and illustrates models, algorithms, principles, and software for deriving causal models from data and for using them to optimize decisions, evaluate effects of policies or interventions, make probabilistic predictions of the values of asyet unobserved quantities from available data, and identify the most likely explanations for observed outcomes, including surprises and anomalies.
The first two chapters survey modern analytics methods, focusing mainly on techniques useful for decision, risk, and policy analysis. They emphasize how causal models are used throughout the rest of risk analytics in detecting and describing meaningful and useful patterns in data; predicting outcome probabilities if different courses of action are followed; identifying and prescribing a best course of action for making preferred outcomes more probable; evaluating the effects of current or past policies and interventions; and learning from experience, either individually or collaboratively, how to make choices that increase the probabilities of preferred outcomes. Chapter 2 also introduces the Causal Analytics Toolkit (CAT), a free inbrowser set of analytics software tools available at http://coxassociates.com/CloudCAT, to allow readers to perform the analyses described or to apply modern analytics methods to their own data sets. Chapters 3 through 11 illustrate the application of causal analytics and risk analytics to practical risk analysis challenges, mainly related to public and occupational health risks from pathogens in food or from pollutants in air. Chapters 12 through 15 turn to broader questions of how to improve risk management decisionmaking by individuals, groups, organizations, institutions, and multigeneration societies with different cultures and norms for cooperation. They examine organizational learning, social risk management, and intergenerational collaboration and justice in managing risks and hazards.
Throughout the book, our main focus is on introducing and illustrating practical methods of causal modeling and analytics that practitioners can apply to improve understanding of how choices affect probabilities of consequences and, based on this understanding, to recommend choices that are more likely to accomplish their intended objectives. We believe that the analytics and big data revolutions now underway will become much more valuable as methods and software for causal analytics become more widely used to better understand how actions and policies affect outcomes.
ACKNOWLEDGMENTS
This book has grown out of efforts over the past decade to understand and explain how to use data and algorithms to determine as accurately, objectively and reproducibly as possible the effects caused by changes in decisions, actions, or policies. This quest has been inspired, encouraged, and supported by many people and organizations. It is a pleasure to thank them.
Chapters 1 and 2 are based largely on a pedagogical approach developed to quickly teach nonspecialists about informationbased causal analytics methods as part of professional development and academic graduate courses taught by Tony Cox in 2017. These included a course on Decision Analysis at the University of Colorado at Denver and professional courses at the annual meetings of the Society for Benefit Cost Analysis (SBCA), the Society for Epidemiologcal Research (SER), and the American Industrial Hygiene Association (AIHce). Bruce Copley, Dennis Devlin, Dale Drysdale, Susan Dudley, Gary Kochenberger, and Deborah Kellog believed in and enthusiastically supported development of a teaching approach and course materials that sought to make key ideas and methods of modern causal analytics accessible to a wider audience. We thank them.
The approach taken in those courses and in Chapter 2 of this book emphasizes the concepts and principles behind current causal analytics algorithms using a minimum of specialist jargon and mathematical notation, and then makes algorithms themselves readily available through software that can be used without learning the underlying R or Python languages and packages. Course materials are available at these links:

hwww.aiha.org/events/AIHce2017/Documents/PDC%20Handouts%202017/PDC%20604%20Handout.pdf

http://coxassociates.com/CausalAnalytics/
The Causal Analytics Toolkit (CAT) software and an explanation of its goals are available at these links:

http://coxassociates.com/CloudCAT

https://regulatorystudies.columbian.gwu.edu/causalanalyticstoolkitcatassessingpotentialcausalrelationsdata
Initial funding for development of CAT was provided by The George Washington University Regulatory Studies Center. Subsequent development of its Predictive Analytics Toolkit (PAT) module, dicussed in Chapter 2, and a port from an Excel addin version to a cloudbased version, were supported in part by the American Chemistry Council. We thank Susan Dudley of the GWU Regulatory Studies Center and Rick Becker of the American Chemistry Council for their support and vision in making free, highquality analytics software available to interested users via CAT.
The applications, ideas and principles in Chapters 315 are based mainly on recent journal articles. Material from the following articles has been used with the kind permission of WileyBlackwell, the publishers of Risk Analysis: An International Journal.

Cox LA Jr, Popken DA. Quantitative assessment of human MRSA risks from swine. Risk Analysis. 2014 Sep;34(9):163950 (Chapter 6)

Cox LA Jr. Overcoming learningaversion in evaluating and managing uncertain risks. Risk Analysis. 2015 Oct; 35(10) (Chapter 12). (Thanks to Jim Hammitt and Lisa Robinson for a fascinating workshop at the Harvard Center for Risk Analysis that stimulated this work.)

PatéCornell E, Cox LA Jr. Improving risk management: from lame excuses to principled practice. Risk Analysis. 2014 Jul;34(7):122839. (Chapter 13)
Material from the following articles has been used with the kind permission of their publishers:

Cox LA, Popken DA, Kaplan AM, Plunkett LM, Becker RA. How well can in vitro data predict in vivo effects of chemicals? Rodent carcinogenicity as a case study. Regulatory Toxicology and Pharmacology. 2016 Jun; 77:5464.

Cox LA Jr. Socioeconomic and air pollution correlates of adult asthma, heart attack, and stroke risks in the United States, 2010–2013. Environmental Research. 2017 May;155: 92107. (Chapter 3)

Cox LA, Schnatter AR, Boogaard PJ, Banton M, Ketelslegers HB. Nonparametric estimation of lowconcentration benzene metabolism. ChemoBiological Interactions. Sep. 2017. (Chapter 4)

Cox LA Jr. 2015. Food microbial safety and animal antibiotics. Chapter 15 in Chen CY, Yan X, Jackson CR (Eds). Antimicrobial Resistance and Food Safety: Methods and Techniques. Elsevier, New York. (Chapter 5)

Popken DA, Cox LA Jr. Quantifying human health risks caused by Toxoplasmosis from open system production of swine. Human and Ecological Risk Assessment. 2015 Oct 3; 21(7): 17171735

Cox, LA Jr, Popken DA. Has reducing PM2.5 and ozone caused reduced mortality rates in the United States? Annals of Epidemiology. 2015 Mar;25(3):16273. (Chapter 10)

Cox LA Jr. How accurately and consistently do laboratories measure workplace concentrations of respirable crystalline silica? Regul Toxicol Pharmacol. 2016 Nov;81:268274. (Chapter 11)

Cox JA Jr., Cox ED. (2016) Intergenerational Justice in Protective and Resilience Investments with Uncertain Future Preferences and Resources. Chapter 12 in P. Gardoni, C. Murphy, and A. Rowell (Eds). Risk Analysis of Natural Hazards: Interdisciplinary Challenges and Integrated Solutions. Springer. New York, New York. (Chapter 15)
We thank the publishers and coauthors of these works.
Discussions with Ron Josephson of the United States Environmental Protection Agency (EPA) in the context of reviewing research proposals on health effects of air pollution helped to inspire the idea of applying causal analysis methods to determine value of information in causal networks (Chapter 2). We thank Dennis Devlin and Bruce Copley of ExxonMobil and Will Ollison of the American Petroleum Institute for stimulating conversations and comments and for their unswerving commitment to discovering objective scientific truth from data to inform more effective decision and policies. As we have worked to develop and apply software to help automatically discover objective scientific truth about causality from data, we have found that this focus is not always popular and that industryinitiated thought leadership in pursuing more objective and reliable scientific inference is not always welcome. Advocates of expert judgmentbased and modeling assumptionbased approaches to causality have sometimes greeted with skepticism and even hostility the ideas that computer algorithms can now be far more accurate and objective than human experts in discovering true causal relations in data, and in identifying and rejecting false causal hypotheses; and that modeling judgments and expert interpretations of statistical patterns are neither necessary nor desirable for drawing valid causal inferences from data. We expect this analyticscentric perspective to continue to grow in popularity as causal discovery algorithms prove their value in a wide array of risk analysis applications. Meanwhile, we thank the visionaries who are pushing to make automated, objective, reproducible, algorithmic approaches to causal model discovery and validation a practical reality.
Finally, we thank Douglas Hubbard of Hubbard Decision Research for inviting lectures and discussions of the causal analytics framework in Chapter 2 at the American Statistical Association Symposium on Statistical Inference (Scientific Method for the 21^{st} Century: A World Beyond p < 0.05 in October of 2017); and Seth Guikema of the University of Michigan for inviting the 2017 Wilbert Steffy Distinguished Lecture on Causal Analytics for Risk Management: Making Advanced Analytics More Useful at the University of Michigan Department of Industrial Engineering and Operations Research in November of 2017. The opportunity to prepare and present these lectures and to participate in the very stimulating discussions that followed contributed to the final exposition in Chapters 1 and 2.
Short Table of Contents
Part 1. Concepts and Methods of Causal Analytics

Causal Analytics and Risk Analytics

Causal Concepts, Principles, and Algorithms
Part 2. Descriptive Analytics in Public and Occupational Health

Descriptive Analytics for Public Health: Socioeconomic and Air Pollution Correlates of Adult Asthma, Heart Attack, and Stroke Risks

Descriptive Analytics for Occupational Health: Is Benzene Metabolism in Exposed Workers More Efficient at Very Low Concentrations?

How Large are Human Health Risks Caused by Antibiotics Used in Food Animals?

Quantitative Risk Assessment of Human Risks of MethicillinResistant Staphylococcus aureus (MRSA) Caused by Swine Operations
Part 3. Predictive and Causal Analytics

Attributive Causal Modeling: Quantifying Human Health Risks Caused by Toxoplasmosis From Open System Production Of Swine.

How Well Can HighThroughput Screening Test Results Predict Whether Chemicals Cause Cancer in Mice and Rats?

Mechanistic Causality: Biological Mechanisms of DoseResponse Thresholds for InflammationMediated Diseases Caused by Asbestos Fibers and Mineral Particless
Part 4. Evaluation Analytics

Evaluation Analytics for Public Health: Has Reducing Air Pollution Reduced Mortality in the United States?

Evaluation Analytics for Occupational health: How accurately and consistently do laboratories measure workplace concentrations of respirable crystalline silica?
Part 5. Risk Management: Insights from Prescriptive, Learning, and Collaborative Analytics

Improving individual, group and organizational decisions: Overcoming learningaversion in evaluating and managing uncertain risks

Improving organizational risk management: From Lame Excuses to Principled Practice

Improving institutions of risk management: Uncertain causality and judicial review of regulations

Intergenerational justice in protective and resilience investments with uncertain future preferences and resources
Detailed Table of Contents
Part 1. Concepts and Methods of Causal Analytics

Causal Analytics and Risk Analytics

Why Bother? Benefits of Causal Analytics and Risk Analytics

Who Should Read this Book? What Will You Learn? What is Required?

What Topics Does this Book Cover?

Causality in Descriptive Analytics

Example: Did customer satisfaction improve?

Example: Simpson’s Paradox

Example: Visualizing air pollutionmortality associations in a California data set

Example: What just happened? Deep learning and causal descriptions

Causality in Predictive Analytics

Example: Predictive vs. Causal Inference – Seeing vs. Doing

Example: Nonidentifiability in Predictive Analytics

Example: Anomaly detection, predictive maintenance, and causespecific failure probabilities

Causal Models Used in Prescriptive Analytics

Normal Form Decision Analysis

Example: Identifying the Best Act in a Decision Table

Example: Optimizing Research Intensity

Example: Optimal Stopping in a Risky Production Process

Example: Harvesting Timber

Markov Decision Processes

Improving MDPs: SemiMarkov Decision Processes and DiscreteEvent Simulation (DES) Models

Performance of Prescriptive Models

Dynamic Optimization and Deterministic Optimal Control

Example: Optimal Harvesting

Stochastic Optimal Control, Hidden Markov Models, Partially Observable Markov Decision Models (POMDPs)

Statistical Decision Theory

SimulationOptimization

Example: An influence diagram (ID) DAG model for optimizing emissions reduction

Example: Forecasting Policy Impacts – Invariance, Causal Laws, and the Lucas Critique in Macroeconomics

Causal Study Design and Analysis in Evaluation Analytics

Randomized Control Trials (RCTs)

Example: Invariant CPTs, Generalization, and Transportability of Causal Laws

QuasiExperiments (QEs) and Intervention Time Series Analysis are Widely Used to Evaluate Impacts Causes by Interventions

Example: Did Banning Coal Burning in Dublin Reduce Mortality Rates?

Counterfactual and Potential Outcome Framework: Guessing What Might Have Been

Example: What Were the Effects of a Public Smoking Ban Policy Intervention on Heart Attack Risks?

Change Point Analysis (CPA) and Sequential Detection Algorithms

A Causal Modeling Perspective on Evaluating Impacts of Interventions using CPTs

Using Causal Models to Evaluate Total Effects vs. Direct Effects

Using MDPs and DES Causal Models to Evaluate Policies and Interventions

Causality in Learning Analytics

Causality in Collaboration Analytics

Causal Models in Game Theory: Normal and Extensive Forms, Characteristic Functions

Causal Models for MultiAgent Systems

Conclusions

Causal Concepts, Principles, and Algorithms

Multiple meanings of “Cause”

Probabilistic Causation and Bayesian Networks

Technical Background: Probability Concepts, Notation and Bayes’ Rule

Example: Joint, Marginal, and Conditional Probabilities for Answering Queries

Example: SamplingBased Estimation of Probabilities from a Database Using R

Bayesian Network (BN) Formalism and Terminology

Using BN Software for Probability Predictions and Inferences

Example: A TwoNode BN for Disease Diagnosis using Netica

Example: Bayesian inference in a small BN – The family out problem

Practical Applications of Bayesian Networks

NonCausal Probabilities: Confounding and Selection Biases

Example: Confounding – Effects of Common Causes

Example: Collider Stratification and Selection Bias – Conditioning on a Common Effect

Causal Probabilities and Causal Bayesian Networks

Example: Modeling Discrimination in College Admissions – Direct, Indirect, and Total Effects of Gender on Admissions Probabilities

Example: Calculating Effects of Interventions via the Ideal Gas Law

Dynamic Bayesian Networks

Other Causal Models Equivalent to BNs

Fault Tree Analysis

Example: A Dynamic Fault Tree Calculator

Event Tree Analysis

Bow Tie Diagrams for Risk Management of Complex Systems

Markov Chains and Hidden Markov Models

Probabilistic Boolean Networks

Time Series Forecasting and Predictive Causation

Structural Equation Models (SEMs), Structural Causation, and Path Analysis Models

Influence Diagrams

Example: DecisionMaking with an Influence Diagram – Taking an Umbrella

Decision Trees

Markov Decision Processes (MDPs) and Partially Observable MDPs (POMDPs)

Predictive Causality and Predictive Analytics

Classification and Regression Tree Models

Example: Seeking Possible Direct Causes Using Classification and Regression Tree (CART) Algorithms

Example: Conditional Independence Tests Constrain DAG Structures

The Random Forest Algorithm: Importance Plots and Partial Dependence Plots

Causal ConcentrationResponse Curves, Adjustment Sets, and Partial Dependence Plots for Total and Direct Effects in Causal Graphs

Including KnowledgeBased Constraints and Multiple Adjustment Sets

Power Calculations for Causal Graphs

Predictive Analytics for Binary Outcomes: Classification and Pattern Recognition

Causal Discovery: Learning Causal BNs from Data

Comparison of Causal Discovery to Associational Causal Concept: Updating the Bradford Hill Considerations

Strength of Association

Example: Confirmation Bias and the Wason Selection Task

Consistency of Association

Plausibility, Coherence, and Analogy of Association

Specificity, Temporality, and Biological Gradient

Methods and Examples of Associational Causation: Regression and Relative Risks

Example of Associative vs. Manipulative Causation in Practice: The CARET Trial

Example of Association Created by Regression Model Specification Error

Relative Risk and Probability of Causation in the Competing Risks Framework

Conclusions on Associational Causation

Comparison of Causal Discovery to Attributive Causal Methods

Example: Attributive Causation is Not Manipulative Causation

Example: 9 Million Deaths per Year Worldwide Attributed to Pollution

Comparison of Causal Discovery to Counterfactual Causal Methods

Example: Attribution of Rainfall to Climate Change, and the Indeterminacy of Counterfactuals

Comparison of Causal Discovery to Structural and Mechanistic Causal Modeling

Example: Dynamic Causal Analysis of the Level of a Single Variable in a Compartment

Example: A CPT for a OneCompartment Model with Uncertain Inputs

Example: Causal Reasoning about Equilibria and Comparative Statics

Historical Milestones in Development of Computationally Useful Causal Concepts

Conclusions
Dostları ilə paylaş: 