Fig. 1.8 Evaluation Results Depend on Modeling Assumptions. The upper and lower panels fit linear and nonlinear models, respectively, to time series data on heart attack (acute myocardial infarction, AMI) incidence in Tuscany, Italy, before and after a January, 2005 ban on smoking in public.
Source: Gasparrini et al., 2009
Change Point Analysis (CPA) and Sequential Detection Algorithms In general, the changes in outcomes caused by policy interventions can be difficult or impossible to determine from time series data alone. Most real-world time series of interest, from economic indicators to disease or mortality counts, result from many factors, not all of which are necessarily known or measured. Hence the statistical characteristics of such time series, including their mean levels and variance around the mean levels, can shift frequently and for unknown reasons. For a volatile, non-stationary time series, comparing values from before an intervention to values after it can be very likely to show statistically significant differences even if the differences were not caused by the intervention.
A partial solution to this challenge is provided by change point analysis (CPA) and sequential detection algorithms (James and Matteson, 2014; Ross, 2015; Palunchenko and Tartkovsky, 2011). CPA algorithms test whether the statistical characteristics of an observed time series changed significantly at one or more points during an interval of observation and, if so, estimate when these “change points” occurred. Some CPA algorithms also estimate the sizes of changes, assuming that they have simple forms such as a step up or down in mean values. Sequential detection algorithms are similar to CPA algorithms but are applied as observations come in to detect a change as soon as possible after it occurs, rather than being applied retrospectively applied in batch mode to look back over an interval and identify changes after the fact. In practice, sequential detection algorithms can be used to trigger alertsor raise alarms (with statistical confidence levels attached) as soon sampling indicates that a process being monitored has changed for the worse; and CPA algorithms can be used later to assess how accurate the warnings probably were and to fine-tune decision thresholds to minimize the sum of costs from false positives and false negatives.
Example: Change-Point Analysis (CPA) Clarifies When and Whether Events Happened The following example is adapted from Bier and Cox (2017). Since 2001, when a letter containing anthrax led to 5 deaths and 17 other infections from which the victims recovered, the U S Environmental Protection Agency (EPA), the Centers for Disease Control and Prevention (CDC), and the Department of Health and Services have invested over a billion dollars to develop surveillance methods and prevention and preparedness measures to help reduce or mitigate the consequences of bioterrorism attacks should they occur again in future (Grundmann, 2014). In practice, detecting a significant upsurge in hospital admissions with similar symptoms may often be the most practical and reliable way to identify that a bioterrorism attack is in progress. The statistical challenge of detecting such changes against the background of normal variability in hospital admissions has motivated the development of methods that seek to reduce the time to detect attacks when they occur, while keeping the rates of false positives acceptably small (Cami et al., 2009; Shen and Cooper, 2012).
Such statistical data analysis and pattern detection, carried out in settings for which the patterns for which one is searching are well understood (e.g., a jump in hospitalization rates for patients with similar symptoms that could be caused by a biological agent) and where enough surveillance data are available to quantify background rates and to monitor changes over time, illustrate the types of uncertainty for which excellent, sophisticated techniques are currently available. Figure 1.9 presents a hypothetical example showing weekly counts of hospital admissions with a specified symptomology for a city. Given such surveillance data, the risk assessment inference task is to determine whether the hospitalization rate increased at some point on time (suggestive of an attack), and, if so, when and by how much. Intuitively, it appears that counts are somewhat greater on the right side of Figure 1.9 than the left, but might this just be due to chance, or is it evidence for a real increase in hospitalization rates?
Figure 1.9 shows the output from a typical statistical algorithm (or computational intelligence, computational Bayesian, machine learning, pattern recognition, data mining, etc. system) for solving such problems by using statistical evidence together with riskmodels to draw inferences about what is probably happening in the real world from observed data.The main idea is simple: the peak at week 26 indicates the time that is computed to be most likely for when an attack occurred, based on the data in Figure 1. The heights of the points in Figure 1.10 indicate the probabilities that an attack occurred at different times, if it occurred at all.
Fig. 1.9 Surveillance time series showing a possible increase in hospitalization rates
Fig. 1.10 Bayesian posterior distribution for the timing of the increase (change point), if one occurred
The same algorithm that produces this information, described next, also estimates how admission rates increased from before to after the attack. The algorithm identifies the time of the attack (week 26), and estimates the magnitude of its effect (not shown in Figure 1.10).
The algorithm used in this case works as follows. In Figures 1.9 and 1.10, the horizontal axes show the possible times, in weeks, at which an attack might have occurred that increased the hospital admission rate from its original level (the baseline or background level) to a new, higher level. For simplicity, we assume that such a one-time, lasting increase in hospitalization rates is known to be the specific form that the observable effect of an attack would have. (More elaborate models would allow for transient effects, multiple waves of attacks, and other complexities, but the highly simplified model of a one-time jump from a lower to a higher level suffices to illustrate key points about change-point detection methods.) On the vertical axis of Figure 1.10 are scaled versions of the likelihoods (discussed next) of the data in Figure 1.9 for each of 60 different hypotheses, each specifying a week when the attack is hypothesized to have occurred. The likelihoods are rescaled so that they sum to 1; this lets them be interpreted as Bayesian posterior probabilities for the attack time, assuming a uniform (flat) prior. Thus, an algorithm that computes likelihoods (i.e., probabilities of the data, given assumed values for the attack week and other uncertain quantities) also allows the most likely values for these quantities to be inferred from the data.
The likelihoods, in turn, are computed from a likelihood function. This gives the probability of the observed data (here, the data on hospital admission counts for all 60 weeks in Figure 1.9) for each value of the uncertain quantity (or quantities) about which inferences are to be drawn (here, the week of attack, the admission rate prior to the attack, and the admission rate following the attack). The unobserved quantities that affect the joint probability distribution of the observed quantities (i.e., the data) are often referred to generically as the state of the system being studied. In this example, the state consists of the week of the attack, the admission rate prior to the attack, and the admission rate following the attack. Symbolically, the likelihood function can be written asP(data | state), denoting the conditional probabilityof the observed data given the values of the unobserved quantities that they depend on. The likelihood of the data in Figure 1.9, given the hypothesis that an attack occurred in any particular week, is just the product of the probabilities of the observed numbers of hospital admissions in all of the weeks (i.e., the data in Figure 1.9), under that hypothesis. It can be calculated by modeling the number of admissions per week as a random variable having a binomial distribution with a mean equal to the product of the number of susceptible people in the community served by the hospital and the admission probability per person per day, which jumps from a baseline value to an increased value when and if an attack occurs. The numerical values of the time and the values of the pre-attack and post-attack admission rates that jointly maximize the likelihood of the data in Figure 1.9 constitute the maximum-likelihood estimates (MLEs) for the attack time and the admission rates before and after the attack. Figure 1.10 shows that the MLE for the attack time – the change-point in the time series in Figure 1.9 – is 26 weeks. If the MLEs for admission rates are 0.1 before the attack time and 0.2 after it, then the MLE for the size of the jump in admission rates caused by the attack would be 0.2- 0.1 = 0.1 admissions per person per week.
This example has illustrated how MLE algorithms and modeling assumptions can be used to estimate the time of a change point and the magnitude of the change that occurred then. The key points are that computational methods are well able to estimate these important unknown quantities from data when: (1) The form of the change to be detected is known (e.g., in this example, the effect of an attack is known to be a one-time increase in admission rates, so that the pattern to look for is known); (2) Plentiful surveillance data are available (which allowed MLE estimates of admission rates to be formed that were highly accurate); and (3) The effect size is large enough to show clearly through the random noise in the surveillance count time series (as indicated by the high peak in Figure 1.10, which can be shown via simulation to be many orders of magnitude greater than the peaks that occur by chance under the null hypothesis of no real change in admission rates). MLE algorithms detect change points quite quickly (within 1-2 weeks with high confidence in this example) under these conditions. However, their success depends on how well conditions 1-3 satisfied. Modern methods of CPA allow these conditions to be relaxed, as discussed next.
Modern algorithms for CPA and sequential detection, available in free R packages (James and Matteson, 2014; Ross, 2015), can be applied to multivariate time series, i.e., to series in which more than one quantity is monitored or observed over time. Several of these algorithms apply non-parametric statistical tests to test whether data permit rejection of the null hypothesis that the underlying data-generating process has not changed. These are useful advances over older methods, including maximum likelihood estimation (MLE) methods, that applied only to univariate time series and that had to assume specific parametric models or conditional probability distributions for the observations, such as that they had normal, binomial, or Poisson distributions with means that might have undergone a jump at some time. Modern CPA and sequential detection algorithms using non-parametric tests improve on the situation in Figure 1.8by providingmore reliable, model-independent answers to the question of whether and whenthe underlyingdata-generating process has changed. Analyses such as those in Figure 1.8 only address whether a selected model estimates different values for the means of observations before and after a user-specified time. Even if the answer is yes, it may simply reflect effects of model specification errors: the selected model may provide different fits to the data before and after some user-specified point without describing either very well, and the difference in fits may simply be an artifact of the model selected to describe it. Non-parametric CPM and sequential detection algorithms avoid these difficulties by making it unnecessary to specify a parametric family of models and by using the data to estimate the times of change points rather than requiring the user to specify them.
Despite these advances, even the best algorithms for change-point analysis, sequential detection, and intervention analysis only address whether and when changes occur (and perhaps their sizes), and not why they occur. They are thus suitable for descriptive analytics, describing what changed when, but not for evaluation analytics assessing the causal impact of decisions or interventions on changes in outcomes.
A Causal Modeling Perspective on Evaluating Impacts of Interventions using CPTs Although they have been widely applied in efforts to evaluate the effects of social, environmental, economic, and public health initiatives, none of the evaluation methods discussed so far provides a reliable way to figure out by how much an action or intervention has changed an outcome of interest, even though this is the main goal of methods of evaluation analytics. When a valid causal DAG model is known for the system or situation being analyzed, however, calculating the effects of actions becomes straightforward (Ortega and Braun, 2014). Consider the following DAG model of a decision:
act outcome state. Here, act represents a decision variable summarizing the choices, controls, policies, courses of action, or interventions whose effects on the outcome random variable are to be evaluated; and state is a random variable summarizing all of the other factors that, together with act, determine the conditional probability of the outcome variable. If act and state both have only a few possible values or levels, then the conditional probability table (CPT) for outcome in this model can be displayed as an array of probability numbers with generic element P(o | a, s), indicating the probability that the outcome value is o if the act chosen is a and if the state is s. The marginal probability table for the input variable state consists of values of P(s), the probability that state has level or value s, for each of its possible values. Now, suppose that the decision-maker intervenes to choose the value for act to be, say, a instead of some other default or status quo value a. This choice changes the predictive probability distribution of the outcome from the old distribution given by
P(o | a) =
to a new distribution given by
P(o | a) = .
The effect on the outcome variable caused by the decision to set act = a is this change in the probability distribution of its values. The data elements needed to compute it are the contents of the marginal probability table or marginal probability distribution for the state input, P(s), and the CPT elements and .
More generally, suppose that the DAG model lets probabilities for values of the state variables depend on the intervention, like this:
actoutcome state. Then changing the act from a to a changes the probability distribution for the outcome from
P(o | a) =
to the following new distribution:
P(o | a) = .
Again, the CPTs in the DAG model provide all the information needed to compute the effect of the intervention, which can be defined as the change that it causes in outcome probabilities, i.e., the shift from P(o | a) to P(o | a). If the DAG model applies to each individual in a population and the outcome variable is numerical (e.g., measured on a ratio scale or an interval scale), then a popular population-level measure of the effect of the intervention is the change in the average value of the outcome over individuals in the population, E(o | a) - E(o | a), where E denotes the expected value operator, E(o | a) = =.
Example: Calculating Causal Impacts of an Intervention Returning to the earlier RCT example with the DAG
Treatment Cure Sex, the CPT for the outcome variable, Cure, can be written as follows: