Causal Analytics for Applied Risk Analysis Louis Anthony Cox, Jr



Yüklə 12,64 Mb.
səhifə7/57
tarix25.07.2018
ölçüsü12,64 Mb.
#58662
1   2   3   4   5   6   7   8   9   10   ...   57

Fig. 1.6 Output from the Analytica® influence diagram in Figure 1.5 showing the expected value of Total Cost as a function of the decision variable Emissions Reduction Factor

Figure 1.6 illustrates the output from this process for the ID in Figure 1.5. Values of the decision input variable, Emissions Reduction Factor (a longer name for Emissions Reduction in the diagram in Figure 1.5) are shown on the horizontal axis, and corresponding expected values of the Total Cost output variable are shown on the vertical axis. This plot was generated by successively incrementing the values of Emissions Reduction Factor, simulating a distribution of Total Cost values for each value of Emissions Reduction Factor, and then plotting the mean of the simulated distributionof values for Total Coston the vertical axis for each value of Emissions Reduction Factor on the horizontal axis. The Analytica® software incorporates sophisticated simulation techniques (such as Latin Hypercube sampling) to increase computational efficiency. It generates the output curve in Figure 1.6 in a fraction of a second even though each point on the curve represents the average value of Total Cost over hundreds of simulation runs. It can equally quickly display upper and lower uncertainty bands around these mean values if desired, such as the high and low values between which 95% of the simulated values of Total Cost fall for each value of Emissions Reduction Factor.

The output in Figure 1.6 is easy to interpret prescriptively: the expected value of Total Cost is minimized by setting Emissions Reduction Factor to a value of 0.7. As often happen when the objective function involved trading off two or more desired attributes, such as low control cost and low excess deaths in this example, the optimal value of the objective function is not very sensitive to the exact choice of value for the decision variable: the expected Total Cost curve is flat in the vicinity of the optimal decision variable value, Emissions Reduction Factor = 0.7. The prescriptive analysis shows that if the ID model in Figure 1.4 is correct, then the best decision of to set Emissions Reduction Factor = 0.7.

The directed acyclic graph (DAG) concept illustrated in Figure 1.5 is of central importance in causal modeling, and it is used extensively in later chapters. Its importance stems from the fact that a DAG shows statistical dependencies and independence relationships between variables. Causation is one way for such statistical dependencies to be created; conversely, effects are not expected to be statistically independent of their direct causes. In a DAG model, each variable is conditionally independent of its more remote ancestors, given the values of its direct parents. For example, in Figure 1.5, although Excess Deathsdepends indirectly on Concentration via the effect of Concentration on Health Damage, the DAG structure implies that the value of Excess Death is conditionally independent of the value of Concentration given the value of Health Damage. Thus, if we had a data set with three columns of numbers for values of the three variables Concentration, Health Damage, and Excess Death for a large number of cases (e.g., days of observation), then even though each column might be correlated with the other two, the correlation between Concentration and Excess Death would be zero when only cases with a given value of Health Damage are considered. More generally, each node in a DAG is conditionally independent of the values of its more remote ancestors, given the values of its parents. Such conditional independence relationships are implications of a DAG model structure that can be tested statistically using data on the levels of the different variables. Statistical tests for conditional independence tests provide one basis for automatically discovering the DAG structure of variables from data, as discussed in Chapter 2.



In practice, decision-makers must confront the possibility that the model used to produce recommendations is not trustworthy. For example, the DAG model in Figure 1.5 makes some assumptions about how excess deaths depend on exposure concentration, via a “Health Damage” function involving a threshold and a health damage factor for exposure concentrations above this threshold. A practitioner could double-click on the “Health Damage” node to view this assumed function but might be uncertain about its validity, and hence uncertain about the validity of the model’s conclusions and recommendations from Figure 1.6. What is often wanted is a reliable way to learn causal models directly from data; to characterize uncertainties about the validity of their predictions; and then to use the models to draw inferences and make recommendations for what actions to take to make preferred outcomes more probable. Figure 1.7 sketches this idealized process. Causal analytics algorithms provide the mapping from data to modelsof the data-generating process, such as the ID in Figure 1.5. Monte Carlo simulation then generates probabilistic predictions and expected utilities and uncertainty bands from these models, given any set of values for the controllable inputs, i.e., the decision variables. Specialized simulation-optimization algorithms can vary the values of the decision variables to seek choices that maximize expected utility or minimize expected total cost or loss as predicted by the simulation model. This simulation-optimization step is not shown in Figure 1.7, since the solution to the optimization problem in Figure 1.6 is visually obvious. Interested readers can find decision optimization algorithms specifically for IDs in Shachter and Bhattacharjya (2010). More general discussions of simulation-optimization methodsare available in many on-line surveys and tutorial papers; in the Handbook of Simulation Optimization (Fu, 2015); at commercial software sites such as www.solver.com/simulation-optimization; and in the documentation of specialized free software packages such as http://crantastic.org/packages/scaRabee.
Fig. 1.7 The role of causal analytics. Causal analytics provides algorithms to develop causal models from data and to use them to quantify effects of risk factors and interventions


Example: Forecasting Policy Impacts – Invariance, Causal Laws, and the Lucas Critique in Macroeconomics
Not knowing the causal relationships among variables – especially, which variables affect which others, as revealed by the arrows in a DAG model – undermines ability to use statistical associations between variables to predict how changing one variable would change another. Simpson’s Paradox and similar examples show that even a highly statistically significant association between an outcome variable Y and another variable X need not reveal how, if at all, changing X would change Y. A similar point has been made in macroeconomics, warning that empirically observed relationships that are not causal (or “structural,” in econometrics jargon) cannot be used to predict correctly the effects of policy interventions if the interventions change the conditions that generated the empirical relationships. For example, if it is observed that higher inflation rates are associated with lower unemployment rates, and even if this empirical relationship between them is found to be stable and reliable for making predictions of unemployment from observations of inflation, this still would not imply that intervening to increase inflation (e.g., by printing more money) would reduce unemployment. The vexed history of the original Phillips curvemodel in macroeconomics illustrates this point. A more general warning against using empirically derived macroeconomic models to predict the effects of policy changes, known as the Lucas critique after Nobel Laureate Robert Lucas, helped to convince many economists that only macroeconomic models with strong microeconomic foundations describing causal relationships that remain invariant as policies change could provide a sound basis for predicting consequences of changes in macroeconomic policy (Hoover, 2014).

In contrast to empirical models, a “structural” model expresses equations or constraints that stay the same (or remain “invariant,” in more technical terminology) as decision variables are changed. For example, in chemistry, the ideal gas law PV = nRT (where P = pressure, V = volume, T = temperature, n is the number of moles of gas, and R is a constant) expresses a constraint that always holds among these variables in equilibrium. In a system configured so that T is the decision variable and V and n are fixed, this structural law implies that exogenously doubling the temperature by heating the gas would cause its pressure to double. Such laws are useful because they continue to hold even if the experimental conditions under which they were discovered are changed. This makes them useful for descriptive, predictive, and prescriptive analyses. Philosophers of science and causality have noted that invariance of causal laws in the face of policy changes can be viewed as a defining characteristic of causality (Cartwright, 2003). Such causal laws are often represented mathematically by structural equations that link the values of variables in such a way that a change in a variable on the right side of the equation (such as T in P = nRT/V) is understood to cause the dependent variable on the left side (here, P) to change until the equation is once again satisfied. Each such equation can be thought of as representing a causal mechanism determining the value of the dependent variable on its left side from the values of the variables on its right side (Simon and Iwasaki, 1988). In a DAG model representing this structure, arrows would point from the variables on the right into the variable on the left. If the relationship between the parent variables on the right and the child variable on the left is probabilistic instead of deterministic, then a random variable can be included on the right-hand side.0



The equivalent of such a structural model in an ID is a conditional probability table (CPT) that is invariant across settings and policies. Such a CPT expresses a probabilistic causal law: if the parents of a node have specified values, then the value of the node has the probability distribution specified by the CPT, no matter what the surrounding context of values of other variable (including decision variables) may be. Seeking such invariant CPTs or structural equations in data provides a different basis for causal discovery algorithms than the tests for conditional independence tests mentioned earlier (Peters et al., 2016; Heinze-Deml et al., 2017). Causal discovery algorithms based on invariance principles are discussed further in Chapter 2.
Causal Study Design and Analysis in Evaluation Analytics
Once a decision has been made and implemented, the next important task for analytics is to collect and analyze data to determine how well the decision has worked – or, more generally, what its effects, intended on otherwise, have been. We call this evaluation analytics. It clearly involves causal analysis, insofar as it involves attributing effects to decisions or policies that caused them. It also usually requires modeling how impacts occur over time, since, in the real world, effects of decisions, interventions, and policy changes take time to reveal themselves. By contrast, time is conspicuously missing from many predictive and prescriptive models, including the influence diagram model and results in Figures 1.5 and 1.6. These relate emissions reductions to predicted resulting changes in total costs without specifying how long those changes will take to occur. As discussed further in Chapter 2, dynamic simulation models can be constructed, using Analytica® or other modeling software packages, provided that sufficient knowledge of the system’s dynamics is available so that the required simulation formulas can be specified. The time courses of responses to changes in the inputs to a system can then be simulated. However, such detailed knowledge of dynamic adjustment processes is often lacking, and it is common practice to use models that relate decisions to their projected consequences when equilibrium is achieved without specifying how long this will take.Figure 1.5 is an example. In general, using data to evaluate the consequences caused by a decision requires considering whether enough time has passed for effects to reveal themselves and whether the data collection effort has been designed in such a way that it can support valid causal inferences.
Randomized Control Trials (RCTs)
The gold standard for evaluating the effects caused by policy interventions in both medicine and the social sciences is often considered to be the randomized control trial (RCT). A RCT evaluates the effects of some “treatment” or intervention on responses in populations of experimental “units.” The units could be individual patients when testing the effects of a new drug on high blood pressure; homes or families when testing the effects of bed netting or chlorine tabs for home drinking water on reducing child mortality in a developing country; villages when testing the effects of a microfinance program on subsequent economic indicators; or companies when testing the effects of a new incentive or behavioral “nudge” program on enrollment in employee retirement of health plans. They key aspect of a RCT is that units arerandomly assigned to the treatment or control groups, or, more generally, to different treatment groups. Systematic, statistically significant differences observed in responses across groups of units receiving different treatments (or none) can then be confidently attributed to the differences in treatments, since randomization removes any other systematic differences among the recipients of different treatments.

Limitations of RCTs include practical and ethical constraints on the possibility of making random assignments and difficulties in generalizing beyond the specific populations or groups studied to other populations or groups to which treatments might be applied. For example, suppose that a well-conducted RCT establishes conclusively that, at a certain hospital, patients randomly selected to receive a new treatment have a significantly higher success rate than similar control group patients not selected to receive the treatment. Even such a strong finding does not guarantee that a similar benefit from the new treatment would hold at other hospitals. This is because other factors that influence the success rate might differ between hospitals, or between the populations that they serve. The challenge of generalizing findings beyond the specific population studied goes by several names, including “transportability” of causal relationships across settings, “external validity” of the findings from a study, and “generalizability” of results. It is important to meet this challenge so that, instead of being limited to drawing narrow conclusions such as “Treatment A worked better than B for the patients in the specific hospital studied during the particular time interval of the study,” one can draw more useful and general conclusions such as “Treatment A works better than treatment B” – or, if qualification is needed, “Treatment A works better than treatment B for people of type Z.” Indeed, a field of pragmatic RCTs has been developing to meet this need, since many results from traditional RCTs have turned out not to generalize well beyond the specific populations and circumstances studied (Patsopoulos, 2011).



How to generalize correctly from particular study results, including the results of particular RCTs, pragmatic RCTs, or field trials, to arrive at valid general causal laws and conclusions has long troubled philosophers of science. It is a version of the notorious problem of induction (Cartwright, 2003). One constructive partial answer makes use of the invariance property of causal laws previously discussed in the context of the Lucas critique of the use of empirical macroeconomic models to predict effects of policy interventions (Hoover, 2014). Suppose that Y is a response or outcome variable of interest; X is a decision variable indicating a policy, intervention, or decision that affects the probability distribution of values for Y; and Z is a vector of covariates that also affect the response. A causal graph (DAG model) succinctly indicates these dependencies as X Y Z, where the conditional probability table (CPT) for Y specifies the conditional probability of each of its possible values as a function of the values of its parents. In notation, this CPT may be denoted by P(y | x, z) to signify the conditional probability that Y takes value y, given that X = x and that Z = z. If the CPT represents a universal causal law, with all the causal parents of Y included in X and Z, then P(y | x, z) must be the same in all settings. With sufficient data, this homogeneity or invariance implication can be tested statistically in several different ways, e.g., by using statistical tests for homogeneity, latent variables, or mixture distributions. Discovering that the same CPT (or, more generally, the same model) holds in a wide diversity of particular data sets provides a possible basis for extrapolating it to new situations, and thus offers a solution to the problem of inductive inference: the invariant law or CPT becomes the generalization that is learned from particular instances. In short, one basis for modern causal discovery algorithms is to use data from a wide range of particular experiments or studies, possibly involving a wide range of different interventions and conditions, to seek causal laws, often expressed as structural equations or CPTs that are invariant across the diverse studies (Peters et al., 2016). These invariant laws provide the sought-for generalization of the particular evidence from which they are derived. They can be used to derive straight-forward adjustments, or transport formulas, for generalizing (or “transporting”) causal inferences from one set of conditions and interventions to another (Bareinboim and Pearl, 2013; Lee and Honavar, 2013). This is simpler than it may sound: just as a validated simulation model for a system can be applied to new input scenarios to discover (via simulation) what outputs they would be likely to produce under changed conditions, so networks of CPTs, i.e., causal models, can be applied to new input conditions to calculate corresponding output probabilities. Software for deriving and applying transport formulas is starting to become available, e.g., via the causaleffect R package at https://cran.r-project.org/web/packages/causaleffect/causaleffect.pdf.
Example: Invariant CPTs, Generalization, and Transportability of Causal Laws
The following example illustrates the arithmetic of how to generalize results from learning or experimental settings in which a probabilistic causal law has been discovered to a different target setting in which it is to be applied. Suppose that large RCTs have been conducted at three hospitals, A, B, and C, to test the success rate for a new treatment compared to an old one. The treatment is for a disease that is never cured without treatment and the old treatment has a 50% success rate in curing people; this rate holds in all hospitals and for all types of patients. By contrast, less is known about the new treatment, but in RCTs at hospitals, A, B, and C, it has been found to have cure rates of 0.82, 0.55, and 0.37, respectively. (For simplicity, we assume that the RCTs are so large that these rates can be treated as accurate, without having to worry about sampling error.) An initial causal DAG model summarizing this very incomplete knowledge of the new Treatment’s effectiveness is as follows:

Treatment Cure Hospital

The outcome variable Cure is coded for each individual so that 1 = successful cure, 0 = not successful cure. Based on the preceding description, the CPT for Cure is as follows:



P(Cure = 1 | Treatment = Old, Hospital = A) = 0.5

P(Cure = 1 | Treatment = New, Hospital = A) = 0.82

P(Cure = 1 | Treatment = Old, Hospital = B) = 0.5

P(Cure = 1 | Treatment = New, Hospital = B) = 0.55

P(Cure = 1 | Treatment = Old, Hospital = C) = 0.5

P(Cure = 1 | Treatment = New, Hospital = C) = 0.37
These equations use the usual notation of conditional probability. The first one, for example, states that the conditional probability that a randomly selected individual from hospital A who received the old treatment will be cured (have Cure = 1) is 0.5. (Since this probability does not depend on the hospital in which the old treatment is administered, we could combine the first, third, and fifth of the above equations into P(Cure = 1 | Treatment = Old) = 0.5, which would be a more efficient way to summarize the same information.) The corresponding conditional probabilities for Cure = 0 are found by subtracting from 1 these conditional probabilities that Cure = 1. Together these 12 conditional probabilities comprise the CPT for Cure for the three hospitals studied so far. However, this does not help to predict what will happen if the new treatment is tried in a new hospital, D. The new hospital would represent a new, as-yet unobserved value for the Hospital variable. How to extend the current CPT to handle new values of its variables cannot be determined from the data already collected. This is the challenge of generalizing from particular study results to generally applicable findings.

What the data on hospitals A-C do show is that something is missing from the DAG model. That success probabilities for the new treatment differ significantly across hospitals A-C invites the question of why they are different – what factors differ across populations in different hospitals that can explain the difference in success rates? As long as a difference remains, this question can always be asked. Only when enough causal parents have been included so that the conditional probabilities for treatment success, given the values of the causal parents, are the same regardless of location does the question no longer arise. This is why invariance is so useful for causal discovery: an adequate causal model must contain the information needed to explain systematic differences in outcomes in terms of invariant conditional probabilities. If it cannot do so, then the remaining unexplained heterogeneity in CPTs limits ability to predict outcome probabilities from the factors that are included in the model.

To help find an invariant causal law, suppose that data are collected on the individuals who participated in the RCTs in hospitals A-C, such as their ages, sexes, ethnicities, medical histories, and so forth. For purposes of a simple illustration, let’s assume that only one of these variables turns out to be useful for predicting the value of Cure: the sex of the patient. Chapter 2 discusses predictive analytics algorithms, including Classification and Regression Trees (CART) and Random Forest algorithms, that are widely applied in machine learning and causal discovery algorithms to determine which variables are informative about, and hence help to predict, the values of an outcome variable of interest.

Specifically, for this example, suppose that the percentages of male patients in the RCTs were 80% at hospital A, 50% at hospital B, and 30% at hospital C. If the sex of the patient is indeed the only variable on which the success of the new treatment depends, then the overall success rate for the new treatment at a hospital can be described by the following structural equation model (SEM):


E(Cure) = a*male_fraction + b*female_fraction
where a = probability that Cure = 1 for men and b = probability that Cure = 1 for women. The data from hospitals A and B give the following two equations:

0.82 = a*0.80 + b*0.20

0.55 = a*0.50 + b*0.50.
These can be solved either manually or using an on-line solver such as the one at http://wims.unice.fr/wims/en_tool~linear~linsolver.en.html to deduce that a = 1 and b = 0.1. We have thus used the data from hospitals A and B to estimate the following SEM:
E(Cure) = male_fraction + 0.1*female_fraction= male_fraction + 0.1*(1 – male_fraction), or

E(Cure) = = 0.1 + 0.9*male_fraction.
If this SEM is correct, then it can be applied to any hospital, since a causal law holds universally once the dependence of outcomes on their causal parents has been correctly specified. For example, to validate this model, it can be used to predict the value of Cure for hospital C:

E(Cure) = 0.1 + 0.9*0.30 = 0.37.
This agrees with, and explains, the data value from the RCTfor hospital C. Such agreement does not prove with logical certainty that the SEM iscorrect, but it adds credibility insofar as it is unlikely that the predicted value of 0.37 would agree by chance with the observed value.

The invariant law E(Cure) = 0.1 + 0.9*male_fraction that we have now discovered not only describes and explains the different RCT results for hospitals A-C, but also it can be used to predict the success rate of the new treatment in any similar future RCTs that might be carried out at other hospitals. The refined DAG model for Cure in any hospital is as follows:


Treatment Cure male_fraction Hospital,
where the CPT for Cure is
P(Cure = 1 | Treatment = old) = 0.5

P(Cure = 1 | Treatment = new) = 0.1 + 0.9*male_fraction.
This is the appropriate generalization of the particular findings from these RCTs. More importantly, the refined understanding of the causal parents of Cureleads to the following DAG model for individual patients:
Treatment Cure Sex
with CPT

P(Cure = 1 | Treatment = old, Sex = male) = 0.5

P(Cure = 1 | Treatment = old, Sex = female) = 0.5

P(Cure = 1 | Treatment = new, Sex = male) = 1

P(Cure = 1 | Treatment = new, Sex = female) = 0.1.
This CPT makes it clear that the best treatment strategy at the level of individual patients, rather than hospital populations of patients, is to prescribe the old treatment to women and the new treatment to men.
This example was simplified for ease of exposition, but the following points hold more generally. First, significant differences in conditional probabilities of different outcomes across different RCT locations, studies or settings, even after conditioning on the values of known factors (e.g., which treatment is administered), indicate that a causal model is incomplete: other causal factors remain to be discovered that affect the outcome. Conversely, a causal law can typically be represented by conditional probabilities of outcome values that are always the same, given the values of the parents of the outcome in a causal model (e.g., a DAG or a set of structural equations). Such an invariant conditional probability table or function provides a generalization of particular instances. For purposes of statistical analysis, it is useful to recognize that causal relationships among variables typically constrain observed data points to lie in a small subset of the volume of points that they could occupy if their values were independent of each other. For example, the data points may lie near a line, curve, or surface (in mathematical terms, a low-dimensional manifold within the higher-dimensional space of the data points) determined by the structural equations or CPTs expressing dependencies among the values of different variables. In the example, this invariant manifold – the same for all data points – was the line E(Cure) = = 0.1 + 0.9*male_fraction. Measurement error, which was ignored in this example, usually makes it necessary to use techniques such as regression to estimate this underlying relationship instead of using the data points to solve for it exactly. Such estimation can be carried out using specialized software programs such as the Invariant Causal Prediction (ICP) package in R (Peters et al., 2016) and its extensions to allow for nonlinear and nonparametric dependencies among variables (Heinze-Deml et al., 2017). However, even with such sophisticated algorithms, a causal DAG model or set of structural equations can still usually be estimated from a subset of the available data points and then validated on one or more different, disjoint subsets. It is standard terminology in much of the machine learning literature to call the data points used to learn or estimate a model the training set and the data points used to validate it the test set. (In the example, with no measurement error, RCT data from any two of the three hospitals A, B, and C could be used to solve for a = 1, b = 0.1, and hence for the underlying linear structural equation model E(Cure) = = 0.1 + 0.9*male_fraction. This model could then be validated by checking that it also described data from the RCT trial at the third hospital.) The basis of the validation is that it is unlikely that the points in a test set will lie on or near the manifold estimated from the training set by chance alone. Chapter 2 discusses machine learning algorithms for learning and validating causal models from data.
Quasi-Experiments (QEs) and Intervention Time Series Analysis are Widely Used to Evaluate Impacts Causes by Interventions
In practice, it is often unethical or impractical to carry out RCTs, including pragmatic RCTs. Other ways are then needed to use data to evaluate the effects caused by interventions. Since the 1960s, one popular way to evaluate the impacts of social programs and policies has been to use quasi-experiments (QEs) (Campbell and Stanley, 1963; White and Sabarwei, 2014). These are studies without randomized assignments of units to treatments that compare outcomes in the treated and untreated groups. For example, one common QE design compares (a) changes in outcome measures from before to after the implementation of an intervention in a group or population that the intervention affects to (b) corresponding changes in a comparison group or population that it does not affect. More recent “difference-in-differences” methods in epidemiology are based on the same idea.

Alternatively, the affected population can be used as its own control. For example, interrupted time series analysis, also known as intervention time series analysis (ITSA) or simply intervention analysis, tests whether the best-fitting time series model describing the data prior to an intervention differs from the best-fitting time series model after the intervention. If so, it attributes the difference to the intervention (Gilmour et al., 2006). Similarly, a recent program from Google called CausalImpact uses data from control time series not affected by an intervention to try to forecast how a variable that the intervention does affect would have evolved in the absence of the intervention; differences between observed and forecast values are attributed to the intervention (https://google.github.io/CausalImpact/CausalImpact.html). For example, to estimate the causal impact of an advertising campaign on daily clicks at a web site, the number of clicks expected without the advertising campaign might be forecast from data on other web sites in markets not affected by the campaign. Then the difference in clicks per day between the observed values and these forecast values can be attributed to the advertising campaign, assuming that no other cause can be identified.

However, the design and analysis of QE comparisons and the interpretation of causal attributions and effects estimates based on the data they produce require considerable care. Because QEs do not randomly assign individuals to treatment (intervention) and comparison groups, there is no rigorous way to make sure that the effects they estimate are actually caused by the intervention instead of merely coincident with it. For example, the average differences in responses between the treatment and control groups, defined as those affected by the intervention or policy being evaluated and those not affected by it, might be explained by unmeasured differences between these groups in the distributions of covariates that also affect the response. Other possible threats to internal validity, in the terminology of Campbell and Stanley (1963), meaning possible non-causal explanations for observed differences between treatment and control groups in a QE study, include the following:


  • History: Other events may coincidentally affect responses and cause changes and differences between treatment and control groups following an intervention even if the intervention itself had no effect, or a lesser effect than the observations suggest.

  • Maturation: Treated individuals get older between the time before an intervention and the time after it. In studies that compare pre-intervention and post-intervention results, using individuals as their own controls, maturation rather than treatment may explain differences over time.

  • Regression to the mean: Suppose that interventions are assigned to individuals, locations, or groups that appear to be most in need of them due to extreme values of some variables taken as indicators of need. Measurements taken later may show less extreme values simply because extreme values are less likely than less extreme values. Students selected for a tutoring program because of extremely poor performance on a standardized test, for example, might be expected to do better next time even if the program had no effect.

  • Awareness of being studied, increased familiarity with survey instruments, investigator biases, biases in missing data or in attrition from the sample, and other non-treatment sources of differences or changes in responses between treated and control groups.

Unless they are carefully tested and refuted using data, such potential alternative explanations threaten the validity of causal interpretations of observed differences between treatment and control groups within a QE study. They are called threats to internal validity because they address causal inferences for the studied populations, i.e., within the scope of the study. As previously discussed, there are also threats to external validity, i.e., to ability to generalize any causal conclusions beyond the specific populations and circumstances of the QE study. For example, the studied treatment and comparison groups may not be representative of target populations of interest, or study conditions that affected the observed outcomes might not hold elsewhere. As discussed in Chapter 2, causal analytics methods addresse these challenges by replacing the key assumption that observed differences in outcomes between treatment and control groups are caused by the treatment or intervention being evaluated with models that describe more explicitly how changes in some variables propagate through specific causal mechanisms (represented by conditional probability tables, structural equations, or other validated causal models) to affect probability probability distributions of other variables.



Ironically, although Campbell and Stanley (1963) were concerned largely with warning against the dangers of using QEs for causal inference, QEs are now increasingly widely used for that purpose. In general, doing so requires strong assumptions, such as that all relevant causes have been measured, that there are no unmeasured confounders, or that observed associations are causal. If these modeling assumptions are violated, then causal inferences drawn from the QE may be mistaken.
Example: Did Banning Coal Burning in Dublin Reduce Mortality Rates?
A famous study in Dublin County, Ireland of the effects on public health of a ban on coal burning reported that both all-cause mortality rates and cardiovascular mortality rate specifically declined substantially from the six years before the ban to the six years following it (Clancy et al., 2002). This study contributed to policy decisions to extend coal-burning bans in Ireland based on a belief that cleaner air had been found to cause reduced mortality. It was estimated that the ban saved thousands of lives per year, based on an assumption that changes in health risks following the intervention were caused by it. This causal assumption was tested a decade later by some of the original authors in an updated study that compared mortality rates in areas if Ireland affected and not affected by the bans (Dockery et al., 2013). The updated study concluded that the bans had produced no detectable reductions in either total or cardiovascular mortality rates (Dockery et al., 2013). As explained by Zigler and Dominici (2014), “However, even when studying an abrupt action, threats to causal validity can arise, as illustrated in extended analyses of the Dublin coal ban that revealed that long-term trends in cardiovascular health spanning implementation of the ban – not the coal ban itself – contributed to apparent effects on cardiovascular mortality.” Social statisticians since the 1960s have cited the “one-group pretest-posttest design” used in the original 2002 study as inappropriate for causal inferences, since it leaves uncontrolled the threat of coincident historical change, as well as other threats to valid causal inference (Campbell and Stanley, 1963, p.7). By contrast, a pretest-posttest control group design such as that in the 2013 follow-up study can show that a large reduction in particulate pollution had no detectable effect on total mortality, as in Dublin, if that is the case; or it can provide strong evidence that high pollution levels cause excess mortalities if mortality rate spikes where and when air pollution spikes – such as in London in 1952 – but not otherwise. Interestingly, although it has been known since at least 2013 that the bans had no apparent effects on total mortality rates, they are still cited by Irish regulators and policy makers as having created very substantial reductions in total mortality. As of this writing in late 2017, the Department of Communications, Climate Action and Environment web page (https://www.dccae.gov.ie/en-ie/environment/topics/air-quality/smoky-coal-ban/Pages/default.aspx) still states that “The smoky coal ban allowed significant falls in respiratory problems and premature deaths from the effects of burning smoky coal. The original ban in Dublin is cited widely as a successful policy intervention and has become something of an icon of best practice within the international clean air community. It is estimated that in the region of 8,000 lives have been saved in Dublin since the introduction of the smoky coal ban back in 1990. Further health, environmental and economic benefits (estimated at €53m per year) will be realised, if the ban is extended nationwide. We intend to extend the health and environmental benefits of the ban on smoky coal, currently in place in our cities and large towns, to the entire country.” Similar disinformation continues to be spread in media accounts of the economic and life-saving benefits of the bans. Thus the fact that the bans had no detectable benefits in reducing all-cause mortality when properly evaluated using control groups (Dockery et al., 2013; Zigler and Dominici 2014) appears to have had no impact on the political and media narratives used to justify spreading them more widely.
Many technical efforts have been made since the 1970s to try to obtain the benefits of RCTs from QE data. One approach is to try to reduce – or, if possible, eliminate – known differences between treatment and comparison groups by carefully matching their individual members using measured variables. Another is to try to adjust statistically for the assumed effects of differences between treatment and control groups using assumed models and sensitivity analyses for their effects. A third is to exploit “natural experiments” in which unplanned events are considered to affect some people but not others as if by chance. For example, if a labor strike shutters a factory that had been discharging smoke into the air, then comparison of what happens next to the heath of people downwind from the factory compared to others upwind or far removed from it might reveal something about the health effects of cleaner air – but only if the affected population and the comparison group are similar in relevant (health-affecting) ways, apart from the change in exposure. Similarly, arbitrary thresholds, such asthe age at which a behavior such as driving, drinking, smoking becomes legal, together with an assumption that people a little on either side of the threshold are otherwise similar to each other, allow differences in outcomes (such as car accident rates) to be estimated and attributed to the difference in permitted behaviors.
Counterfactual and Potential Outcome Framework: Guessing What Might Have Been
QE design and data analysis strategies have given rise to a variety of statistical techniques for estimating population-level causal effects from QE data. A common underlying philosophical framework, the counterfactual/potential outcome framework – is widely used to interpret their results (Höfler, 2005). This holds that causal effects in populations can be estimated by comparing estimates of what did happen, e.g., real observed values of illness or death rates, to estimates of what would have happened had causes been different. Since what would have happened under different counterfactual conditions is never observed, and since the reasons for the hypothetical counterfactual conditions are seldom specified in sufficient detail to predict outcomes uniquely, potential outcome methods depend heavily on the use of statistical models and assumptions to predict what would have happened. This makes their estimates of causal impacts dependent on modeling assumptions. Different choices of modeling assumptions can produce very different estimates of causal impacts.

Methods developed within the potential outcome framework include propensity score matching (PSM), marginal structural models (MSMs), instrumental variables (IVs), regression discontinuity designs (RDDs), and related methods (Höfler, 2005; White and Sabarwei, 2014). Although they are sometimes described as yielding rigorous estimates of average causal effects in populations from QE data, potential outcome methods depend on making strong modeling assumptions whose validity is seldom known in practice. Examples include the assumptions that are that there are no unobserved confounders, that all relevant factors have been observed, and that all types of individuals have received all treatments. In effect, by assuming that average differences estimated from data are causal rather than being coincidental or explained by something else, potential outcome methods do arrive at causal conclusions, but at the price of reliability. Their causal conclusions may well be wrong, and different investigators may reach contradictory conclusions, starting from the same data, by choosing different counterfactual modeling assumptions.

Example: What Were the Effects of a Public Smoking Ban Policy Intervention on Heart Attack Risks?

To illustrate the promise and pitfalls of QEs, consider the two plots in Figure 1.8.Eachone compares estimated mean monthly incidence rates of heart attacks (acute myocardial infarction, AMI) among adults 30-64 years old in Tuscany, Italy in the five years before a January, 2005 ban on public smoking and in the year following the ban (shaded area to the right). Estimates with seasonal effects included (solid curves) show that, as in other studies, heart attack risks are greatest in the cold winter months, especially December and January, and are lowest in the hot summer months. The dashed curves show estimates of the baseline incidence rate of heart attacks with the seasonal changes subtracted out and with a linear trend line (upper plot) or a nonlinear trend curve (lower plot) fit to the data points. (These curves were generated using Poisson regression modeling, since the data show counts of AMI cases, but the main points do not depend on this specific modeling technique.) In the upper plot, there is a clear decrease in estimated baseline incidence rates of AMI from before the ban to after it, consistent with a causal hypothesis that the public smoking bans caused a prompt decrease in heart attack risks. The estimated size of the decrease is 5.4%. In the lower plot, however, with a nonlinear trend fit to the same data, there is no significant decrease (and, in fact, a slight increase) in the estimated AMI incidence rate from before to after the ban. This illustrates how the size and direction of the effect of the ban estimated from these data depend on the model selected to analyze and interpret the data.


Yüklə 12,64 Mb.

Dostları ilə paylaş:
1   2   3   4   5   6   7   8   9   10   ...   57




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə