Evaluation Analytics for Public Health: Has Reducing Air Pollution Reduced Death Rates in the United States? Introduction: Using Data from Natural Experiments to Understand Causality An aim of applied science in general, and of epidemiology in particular, is to draw sound causal inferences from observations. For public health policy analysts and epidemiologists, this includes drawing inferences about whether historical changes in exposures have actually caused the consequences predicted for, or attributed to, them. The example of the Dublin coal-burning ban introduced in Chapter 1 suggests that accurate evaluation of the effect of interventions is not always easy, even when data are plentiful. Students are taught to develop hypotheses about causal relations, devise testable implications of these causal hypotheses, carry out the tests, and objectively report and learn from the results to refute or refine the initial hypotheses. For at least the past two decades, however, epidemiologists and commentators on scientific methods and results have raised concerns that current practices too often lead to false-positive findings and to mistaken attributions of causality to mere statistical associations (Lehrer, 2012, Sarewitz, 2012, Ottenbacher, 1998; Imberger et al., 2011). Formal training in epidemiology may be a mixed blessing in addressing these concerns. As discussed in Chapter 2, concepts such as “attributable risk,” “population attributable fraction,” “burden of disease,” “etiologic fraction,” and even “probability of causation” are solidly based on relative risks and related measures of statistical association; they do not necessarily reveal anything about predictive, manipulative, structural, or explanatory (mechanistic) causation (e.g., Cox, 2013; Greenland and Brumback, 2002). Limitations of human judgment and inference, such as confirmation bias (finding what we expect to find), motivated reasoning (concluding what it pays us to conclude), and overconfidence (mistakenly believing that our own beliefs are more accurate than they really are), do not spare health effects investigators. Experts in the health effects of particular compounds are not always also experts in causal analysis, and published causal conclusions are often unwarranted, as reviewed in Chapter 2, with a pronounced bias toward finding “significant” effects where none actually exists (false positives) (Lehrer, 2012, Sarewitz, 2012, Ioannadis, 2005, The Economist, 2013).
This chapter applies methods of causal hypothesis-testing (Granger causality tests and conditional independence tests) to evaluation analytics, i.e., assessment of the effects caused by past changes, for the important practical problem of assessing improvements in public health risks caused by past reductions in air pollutionconcentrations.To do so, we take advantage of the fact that between years 2000 and 2010, air pollutant levels in counties throughout the United States changed significantly, with fine particulate matter (PM2.5) declining over 30% in some counties, and ozone (O3) exhibiting large variations from year to year. This history provides an opportunity to compare county-level changes in average annual ambient pollutant levels to corresponding changes in all-cause and cardiovascular disease (CVD) mortality rates over the course of a decade. This chapter examines data from these “natural experiments” of changing pollutant levels for 483 counties in the 15 most populated U.S. states using quantitative methods for causal hypothesis testing, such as conditional independence and Granger causality tests. We shall see that the hypothesis of a significant statistical association between air pollution and mortality rates is well supported, but that the hypothesis of a predictive causal relation between them is not. For example, no significant positive associations are found between changes in PM2.5 or O3 levels and corresponding changes in disease mortality rates between 2000 and 2010, nor for shorter time intervals of 1-3 years.
Dominici et al., (2014) noted that “[A]nalyses of observational data have had a large impact on air-quality regulations and on the supporting analyses of their accompanying benefits, [but] associational approaches to inferring causal relations can be highly sensitive to the choice of the statistical model and set of available covariates that are used to adjust for confounding. … There is a growing consensus… that the associational or regression approach to inferring causal relations – on the basis of adjustment with observable confounders – is unreliable in many settings.” They demonstrate that the choice of regression model can result in either statistically significant positive or statistically significant negative associations between air pollutant levels and mortality rates. This implies that implicit modeling choices can greatly affect – or even determine – the results presented to decision-makers and the public. Table 10.1 provides some examples of important policy-relevant conclusions and doubts about their validity from the recent air pollution health effects literature.
Table 10.1. Some conflicting claims about health effects known to be caused by air pollution
Pro (causal interpretation or claim)
Con (counter-interpretation or claim)
“Epidemiological evidence is used to quantitatively relate PM2.5exposure to risk of early death. We find that UK combustion emissions cause13,000 premature deaths in the UK per year, while an additional6000 deaths in the UK are caused by non-UK European Union (EU) combustion emissions” (Yim and Barrett, 2012).
“[A]lthough this sort of study can provide useful projections, its results are only estimates. In particular, although particulate matter has been associated with premature mortality in other studies, a definitive cause-and-effect link has not yet been demonstrated” (NHS, 2012)
“[A]bout 80,000 premature mortalities [per year] would be avoided by lowering PM2.5 levels to 5 g/m3 nationwide” in the U.S. 2005 levels of PM2.5 caused about 130,000 premature mortalities per year among people over age 29, with a simulation-based 95% confidence interval of 51,000 to 200,000 (Fann et al., 2012).
“Analysis assumes a causal relationship between PM exposure and premature mortality based on strong epidemiological evidence… However, epidemiological evidence alone cannot establish this causal link” (EPA, 2011, Table 6-11).
Significant negative associations have also been reported between PM2.5 (Krstić 2010) and short-term mortality and morbidity rates, as well as between levels of some other pollutants (e.g., NO2 (Kelly et al., 2012) and ozone (Powell et al., 2012)) and short-term mortality and morbidity rates.
“Some of the data on the impact of improved air quality on children’s health are provided, including… the reduction in the rates of childhood asthma events during the 1996 SummerOlympicsin Atlanta, Georgia, due to a reduction in local motor vehicle traffic” (Buka et al., 2006). “During the Olympic Games, the number of asthma acute care events decreased 41.6% (4.23 vs. 2.47 daily events) in the Georgia Medicaid claims file,” coincident with significant reductions in ozone and other pollutants (Friedman et al., 2001).
“In their primary analyses, which were adjusted for seasonal trends in air pollutant concentrations and health outcomes during the years before and after the Olympic Games, the investigators did not find significant reductions in the number of emergency department visits for respiratory or cardiovascular health outcomes in adults or children.” In fact, “relative risk estimates for the longer time series were actually suggestive of increased ED [emergency department] visits during the Olympic Games” (Health Effects Institute, 2010)
“An association between elevated PM10 levels and hospital admissions for pneumonia, pleurisy, bronchitis, and asthma was observed. During months when 24-hour PM10 levels exceeded 150 micrograms/m3, average admissions for children nearly tripled; in adults, the increase in admissions was 44 per cent.” (Pope, 1989)
“Respiratory syncytial virus (RSV) activity was the single explanatory factor that consistently accounted for a statistically significant portion of the observed variations of pediatric respiratory hospitalizations. No coherent evidence of residual statistical associations between PM10levels and hospitalizations was found for any age group or respiratory illness.” (Lamm et al., 1996)
“Reductions in respiratory and cardiovascular death rates in Dublin suggest that control of particulate air pollution could substantially diminish daily death....Our findings suggest that control of particulate air pollution in Dublin led to an immediate reduction in cardiovascular and respiratory deaths.” (Clancy et al., 2002)
"The results could not be more clear, reducing particulate air pollution reduces the number of respiratory and cardiovascular related deaths immediately" (Harvard School of Public Health, 2002).
Mortality rates were already declining long before the ban, and occurred in areas not affected by it. “Serious epidemics and pronounced trends feign excess mortality previously attributed to heavy black-smoke exposure” (Wittmaack, 2007). “Thus, a causal link between the decline in mortality and the ban of coal sales cannot be established” (Pelucchi et al., 2009). “In contrast to the earlier study, there appeared to be no reductions in total mortality or in mortality from other causes, including cardiovascular disease, that could be attributed to any of the bans. That is, after correcting for background trends, similar reductions were seen in ban and non-ban areas.” (HEI, 2013)
Source: Adapted from Cox (2013)
To overcome this difficulty, Dominici et al. (2014) proposed the use of quasi-experiments (QEs), or natural experiments, in which outcomes are compared between a treatment and control group are compared, but without random assignment or other determination of the treatment status by the researcher. As an example, they cite the Dublin coal-burning ban study discussed in Chapter 1, reporting significantly lower mortality rates in the six years following a ban on coal-burning in Dublin County, Ireland compared to the six years prior to the ban (Clancy et al., 2002). This proposal to use QEs to better assess causal relations between pollution levels and health effects has been hailed as "a paradigm-shifting solution"(Harvard Law Today, 2014). Yet, since QEs were first introduced in social statistics in the 1960s, expert practitioners have recognized that “in many quasi-experiments, one is most often left with the question: ‘Are there alternative explanations for the apparent causal association?’” (Harris et al., 2006). Such alternative explanations, or threats to the internal validity of causal inferences for the studied populations, are discussed in Chapter 1. They must be refuted before valid causal inferences can be drawn from QEs (Campbell and Stanley, 1966; Maclure, 1991; Rothman and Greenland, 2005). For example, to be valid, the conclusion that a ban on coal-burning caused an immediate reduction in all-cause and cardiovascular mortality (Harvard School of Public Health, 2002) would have had to refute plausible alternative explanations. Including a relevant historical or contemporaneous control group (using a pretest-posttest design or a nonequivalent control group design, respectively, in QE terminology) would have allowed the elimination of non-causal explanations, such as that (a) mortality rates were already declining before the ban, and continued to do so without significant change during and afterward for reasons unrelated to the ban (the “History” threat to internal validity, in QE terminology); or (b) mortality rates declined at the same rate in areas not affected by the ban as in areas affected by it. For the Dublin study, both possibilities (a) and (b) proved to be true, so that no valid conclusions about the impact of the ban on all-cause or cardiovascular mortality rates can be drawn (Wittmaack, 2007; Pelucchi et al., 2009). Indeed, upon reanalysis using relevant control groups, no effect of the ban on these outcomes could be detected (Health Effects Institute, 2013). Yet, as Dominici et al, rightly note, natural experiments occur frequently and if properly analyzed, can provide crucial policy-relevant insights into causality (or lack thereof) in observed exposure-response relations. In the U.S. for example, geographic heterogeneity in the rates at which pollutant levels have declined in different regions has created many natural experiments for assessing the effects of these changes on public health over time.
To take advantage of these natural experiments, the following sections compare changes in PM2.5 and O3 levels from 2000 to 2010 to corresponding changes in all-cause and CVD age-specific mortality rates over the same interval, for hundreds of counties in the 15 largest states in the U.S. Treating county as the unit of observation, as in the Dublin study and many others where individual-level exposure data are not available, invites application of longitudinal designs and methods in which each county’s history of pollution levels and mortality rates serves as its own control group for purposes of determining how subsequent changes in pollution are associated with subsequent changes in mortality rates (Campbell and Stanley, 1963). Using repeated observations on the same counties over time also allows the effects of unmeasured (and possibly unknown) confounders to be largely controlled for as changes in pollutant levels and mortality rates are calculated – the basic strategy of panel data analysis (Angrist and Pischke, 2009). The goal of our analysis is to understand the extent to which historical associations between pollutant levels and mortality rates reflect a clear causal relation, rather than merely coincident trends, the effect of confounders, or modeling choices.
Table 10.2 lists several quantitative methods for causal hypothesis testing, modeling, and analysis that have been extensively developed and applied over the past six decades (Cox, 2013). Chapter 2 provides much more detail on several of these methods. Various advantages of these techniques, as compared to qualitative causal criteria (Rothman et al., 2005) such as the traditional Hill considerations and other weight-of-evidence and associational methods, are well explained and illustrated in the references for Table 10.2 (e.g., Greenland and Brumback, 2002), along with their limitations (e.g., Freedman, 2004). Prominent among these advantages is the development of empirically testable implications of causal hypotheses, such as conditional independence implications, timing implications, information-theoretic implications, and exogeneity implications, with conditional probability distributions of some variables being determined by the values of others (see Chapter 2). These testable implications capture the inherent asymmetry inherent in the notion of causation, unlike correlations or other symmetric measures of association. They can be tested statistically using publically available standard computer codes, such as those in R and Python/NumPy. This enables different investigators, perhaps with very different prior beliefs, to reach the same conclusions from the same data. This points the way toward greater objectivity and definitiveness in determining via such tests the extent to which data do or do not support causal hypotheses, based on their testable implications.
Table 10.2. Some formal methods for modeling and testing causal hypotheses
Method and References
Appropriate study design
Quasi-experimental design and analysis (Campbell and Stanley, 1966)
Can control group comparisons refute alternative (non-causal) explanations for observed associations between hypothesized causes and effects, e.g., coincident trends and regression to the mean? If so, this strengthens causal interpretation.
Observational data on subjects exposed and not exposed to interventions that change the hypothesized cause(s) of effects.
Conditional independence tests (Freedman, 2004, Friedman and Goldszmidt, 1998)
Is hypothesized effect (e.g., cardiovascular disease (CVD) mortality rate) statistically independent of hypothesized cause (e.g., PM2.5 concentration), given (i.e., conditioned on) the values of other variables, such as education and income? If so, this undermines causal interpretation.
Cross-sectional data; Can also be applied to multi-period data (e.g., in dynamic Bayesian networks)
Panel data analysis (Angrist and Pischke, 2009, Stebbings, 1976)
Are changes in exposures followed by changes in the effects that they are hypothesized to help cause? If not, this undermines causal interpretation; if so, this strengthens causal interpretation.
Example: Are reductions in PM2.5 levels followed (but not preceded) by corresponding changes in CVD mortality rates?
Panel data study: Collect a sequence of observations on same subjects or units of observation (e.g., counties) over time
Granger causality test (Eichler and Didelez, 2010)
Does the history of the hypothesized cause improve ability to predict the future of the hypothesized effect? If so, this strengthens causal interpretation; otherwise, it undermines causal interpretation.
Example: Can CVD mortality rates be predicted better from time series histories of PM2.5 levels and mortality rates than from the time series history of mortality rates alone?
Intervention analysis and change point analysis (Helfenstein, 1991; Gilmour et al., 2006)
Does the best-fitting model of the observed data change significantly at or following the time of an intervention? If so, this strengthens causal interpretation.
Do the quantitative changes in hypothesized causes predict and explain the subsequently observed quantitative changes in hypothesized effects? If so, this strengthens causal interpretation.
Example: Do mortality rates fall faster in counties where pollutant levels fall faster than in other counties?
Time series observations on hypothesized effects, and knowledge of timing of intervention(s)
Quantitative time series data for hypothesized causes and effects
Counterfactual and potential outcome models ( Moore et al., 2012)
Do exposed individuals have significantly different response probabilities than they would have had if they had not been exposed?
Example: Do people have lower mortality risk after historical exposure reductions than they would have had otherwise?
Cross-sectional and/or longitudinal data, with selection biases and feedback among variables allowed
Causal network, path analysis, and structural equations models of change propagation (Hack et al., 2010)
Do changes in exposures (or other causes) create a cascade of changes through a network of causal mechanisms (represented by equations), resulting in changes in the effect variables?
Example: Do relatively large variations in daily levels of fine particulate matter (PM2.5) air pollution create corresponding variations in markers of oxidative stress in the lungs?
Observations of variables in a dynamic system out of equilibrium
Negative controls (for exposures or for effects) (Lipsitch et al., 2010)
Do exposures predict health effects better than they predict effects that cannot be caused by exposures?
Example: Do pollutant levels predict cardiovascular mortality rates better than they explain car accident mortality rates? If not, this weakens causal interpretation of the CVD associations.
Source: Adapted from Cox (2013)
Other reasons why modern methods of quantitative causal analysis should be (and increasingly are) included among current approaches in the epidemiologist’s tool kit are discussed in modern epidemiology textbooks and monographs (e.g., Hernan and Robbins, 2011) and in the references to Table 10.2. The purpose of this chapter is not to further review these methods, but to apply those that are most useful to the air pollution and mortality rate records in the United States.
Data and Methods Cause-specific mortality rates, by county and age group, were downloaded from the Centers for Disease Control and Prevention (CDC) Wonder “Compressed Mortality, 1999-2010” database (CDC, 2014). To create a geographically diverse sample, mortality rates were extracted at the county level for the 15 largest states in the U.S. (California, Texas, New York, Florida, Illinois, Pennsylvania, Ohio, Georgia, Michigan, North Carolina, New Jersey, Virginia, Washington, Massachusetts, Arizona) representing approximately 65% of the total U.S. population. We extracted mortality rates (per hundred thousand person-years) for all causes of death, and then created three disease subcategories:(1) diseases of the circulatory system (International Classification of Diseases, 10th revision codes [ICD-10]I00-I99), (2) all external causes of death(ICD-10 codes V01-Y89) and (3) total disease-related mortalities (all causes of death excluding external causes). The dependent variables shown in subsequent tables thus included the following:
CVRatePer100K - Mortality rate (per 100,000 people per year) due to all heart/circulatory diseases
ExtRatePer100K – Mortality rate due to external causes (used as a negative control). (To investigate whether the methods used can detect causal known relationships, we also used a positive control in which a known causal effect was simulated, as discussed later for Table 7.)
ACRatePer100K – Mortality rate due to all disease-related(non-external) causes
Most of our analyses were restricted to ages 65+ years, as they have the highest CVD mortality rates. Age was categorized as 65-74 years, 75-84 years, and 85+ years.
County-level air quality data for PM2.5 (daily 24-hr mean) and O3 (daily maximum 8-hour moving average) were downloaded from the U.S. Environmental Protection Agency Air Quality System (AQS) for all monitors located in each county (n=483) of the fifteen states listed above (EPA, 2014). Data were obtained for the years 2000-2010. The two pollutant measures were summarized as county-level annual averages in our analyses .
The mortality and air quality data were merged by state/county and year. The resulting merged data file contained data for 483 distinct counties from 2000-2010, although not all counties collected both ozone and PM2.5 data for all years. These merged data files are freely available from the authors upon request.
Statistical Analysis Methods
The methods in Table 10.2 that are most useful for the air pollution and mortality rate data sets just described include conditional independence tests, longitudinal comparisons of changes in death rates and changes in pollution levels, Granger causality tests, and negative controls comparing presumably non-causal associations between longitudinal changes in accident and other “external” (non-disease) death rates and changes in pollutant levels to associations between changes in disease mortality rates and changes in pollutant levels. These are described in the following paragraphs. All statistical computations were carried out using the Statistica 12.5 statistical computing environment, with the exception of the Granger causality tests, described below. Other methods in Table 10.2, such as change-point analysis and intervention analysis for an intervention that occurs at a single point in time (e.g., closing a steel mill or banning coal-burning in Dublin) are less relevant for these data, since both changes in PM2.5 and changes in mortality rates occurred gradually over a decade, rather than abruptly from before to after some intervention.
Association-Based Methods: Correlation and Regression
Although not methods of causal analysis, association-based methods such as correlation and regression analysis are widely used in air pollution health effects research (Dominici et al., 2014). We used these methods also to test whether applying them in this data set produced similar results to past studies. Intuitively, the absence of any association might be interpreted to suggest that causation is unlikely (Hill, 1965; Weed, 2009). We used Pearson product-moment linear correlation coefficients and linear regression coefficients as measures of linear association, since past research suggests an approximately linear association of PM2.5 and O3 with mortality (e.g., Lepeule et al., 2012).
Conditional Independence Tests
If a statistically significant association between exposure and response variables is found, e.g., based on linear correlation and regression tests, then an important screening test for potential causation is the conditional independence test: does a significant association remain even after conditioning on potential confounders, such as age or year? For example, if a significant association between PM2.5 and CVD mortality were hypothesized to be due to confounding by year (because both PM2.5 and CVD mortality rates both declined with time, even if one did not cause the other), then one could condition on year (i.e., holding it fixed at a given value, such as 2010), and test whether the conditional association vanishes within the subset of records with that value (e.g., with Year = 2010).
To avoid biasing results by manual selection of variables to condition on, we relied on automated backward stepwise variable selection in our multiple regression models. This is a standard – but deservedly controversial – technique. We do not advocate it for general use, as it over-fits models to data, producing excess false positives in simple settings. We therefore have used it only as a readily available automated approach that may be more familiar and easily available than alternatives such as Bayesian Model Averaging; but we have also verified the main conclusions using multiple disjoint random samples of the data (20% cross-validation), to guard against the defects of backward stepwise selection. The backward stepwise selection procedure uses successive F tests to determine whether dropping individual variables (e.g., O3 concentration) from the set of potential explanatory variables significantly decreases the ability of the model to predict values of the dependent variable (e.g., CVD mortality risk). If not, i.e., if the F test indicates that the dependent variable is conditionally independent of a potential explanatory variable (such as O3), given the values of other variables in the model, then that variable is automatically dropped from the final set of explanatory variables. Despite its flaws, use of this technique reduces subjectivity in choosing explanatory variables. We used the default settings in Statistica (e.g., p values of 0.05 to define significant associations).
Correlations among Changes over Time Perhaps the most important screening test we use for potential causality is examining whether changes in an exposure help predict and explain changes in a response. A frequent confusion in epidemiology is to interpret the slope of a concentration-response relation as indicating the future change in response (e.g., mortality rates) that would be caused by a unit change in future exposure concentration. This is incorrect, since many concentration-response associations are not entirely causal (e.g., due to confounders or modeling biases). Rather than using slopes of cross-sectional regression lines as proxies for causal impacts, we directly tested whether there were significant positive correlations and regression coefficients between longitudinal changes in county-specific PM2.5 and O3 levels from 2000 to 2010, and corresponding longitudinal changes in county-specific and age-specific mortality rates; and whether counties with more rapid declines in PM2.5 and O3 had more rapid declines in mortality than those with slower declines, or where concentrations increased.
Granger Causality and Negative and Positive Controls A more general approach than studying associations between changes in exposure concentrations and changes in mortality rates over a single time interval is to use time series analysis to test whether past values of exposure help to predict present and future mortality rates more accurately than they can be predicted from past mortality rates alone. This is the basic idea of the Granger causality test (Eichler and Didelez, 2010). If the future of a mortality rate time series is conditionally independent of the past and present exposure time series, given the past and present mortality rate series, so that knowing exposure does not improve ability to predict future mortality rates, then exposure is not a Granger-cause of mortality. The Granger causality test produces a p-value for the null hypothesis that one time series does not improve prediction of another compared to using lagged values of the dependent variable itself.
We performed the Granger tests, using the grangercausalitytests function in the Python statsmodels module, for each county and age category combination described above, with the restriction that the combination must have at least 10 consecutive annual values available for analysis. We tested lags of 1-3 years, as many previous studies suggest that reductions in PM2.5 and other pollutants lead to almost immediate reductions in mortality rates, e.g., within as little as a few days, and certainly well within a year or two (e.g., Friedman et al., 2001; Clancy et al., 2002; Lepeule et al., 2012, Yang et al., 2013). The Python Granger function grangercausalitytests provides p values for each of four separate test statistics (two based on the F distribution and two on the chi-square distribution), all of which yield closely similar results. We evaluated the proportion of counties for which these tests produced a p-value of 0.05 or less; random variation alone could explain this occurring in about 5% of counties. Significantly higher levels would be suggestive of a Granger causality effect.
In addition to formal test statistics, we also compared the statistical association between changes in exposures and changes in disease-related mortality rates, on the one hand, to the association between changes in exposures and changes in non-disease-related (external-cause) mortality rates, on the other. The external-cause mortality rates include deaths due to accidents and assaults, changes in which are presumably not caused by changes in pollution levels. Such negative controls test whether hypothesized causal associations are stronger than those presumed to be non-causal (Lipsitch et al., 2010). As discussed further later (c.f. discussion of Table 7), we also simulated the effects of a positive causal relation between changes in pollution levels and changes in mortality rates. This simulation-based analysis served as a type of positive control to test whether sample sizes are large enough and whether the statistical methods we applied are powerful enough to detect such genuine causal effects if they are present. Finally, we briefly examined the geographic pattern of results to determine whether findings appeared to hold consistently in different parts of the United States.
Descriptive Analytics Figure 10.1 shows trends in average pollution levels, population, and mortality rates for all counties from 2000-2010. For each time series, values are normalized by dividing by the value in 2000, so that all time series values in 2000 are defined as 1. PM2.5 and CVD mortality rates declined most steeply over this interval (two lowest curves), while population levels and external-cause mortality rates (e.g., from accidents) increased, perhaps reflecting a longer-lived, aging population.
Figure 10.1. Trends in relative values of pollutants, mortality rates, and population, 2000-2010
Figure 10.2 shows how the age-specific mortality curve, plotting annual deaths per capita vs. age, has shifted downward over time. (The horizontal positions for the rates have been spread out to allow easy visualization of trends. Vertical bars indicate 95% confidence intervals for the mean mortality rates but are very narrow due to the large sample sizes.) Clearly, age-specific mortality rates have declined for all age groups, but most for the older age groups.
Figure 10.2. Declines of age-specific cardiovascular disease (CVD) mortality rates over time (top curve is for year 2000, bottom curve is for year 2010)
Figure 10.3 shows analogous curves for age groups 55-64, 65-74, 75-84, and 85 or older, abbreviated 55, 65, 75, and 85, respectively, for different average PM2.5 levels in 2000 (left) and 2010 (right). At all PM2.5 levels, age-specific mortality rates declined conspicuously from 2000 to 2010. In both years, mortality rates in the oldest age categories were higher at PM2.5 levels of 12 µg/m3 than at 3 µg/m3, suggesting a possible persistent positive association between PM2.5 concentrations and elderly mortality rates.
Figure 10.3. Decline of older age-specific mortality rates over time (left panel is for year 2000, right panel is for year 2010) for counties with different average PM2.5 levels
There was substantial geographic heterogeneity in both PM2.5 values and CVD mortality rates among the counties in this study, allowing the relation between them to be studied with considerable statistical power despite the smoothing effects of using county-level data (Savitz, 2012). PM2.5 average levels ranged from below 2 to above 20 micrograms per cubic meter, and cardiovascular deaths per 100,000 people per year ranged from close to zero (for younger age groups) to over 10,000 deaths per 100,000 person-years (for the oldest age group in early years). Even for a single age group (e.g., 75-84 year-olds) and a single year (2010), there is a greater than 5-fold variation in CVD mortality rates and a more than 8-fold variation in average PM2.5 levels among counties, as shown in Figure 10.4.
Figure 10.4. There is substantial geographic heterogeneity in PM2.5 levels and CVD mortality rates even within a single age group and year (here, 75-84 year olds in 2010)
Results on Statistical Associations between Pollutant Levels and Mortality Rates Table 10.3 shows the Pearson correlation coefficients between PM2.5 and O3 levels, county population sizes, and all-cause, cardiovascular, and external-cause (non-disease) mortality rates, holding year and age fixed at 2010 and 75-84 years, respectively. Similar correlations hold for other years. All off-diagonal correlation coefficients in Table 10.3 are statistically significant from zero (p < 0.05) except for the -0.09 correlation between PM2.5 levels and non-disease mortality rates (ExtRatePer100k). Specifically, Table 10.3 shows the following significant associations:
PM2.5 and O3 concentrations are positively associated with each other (correlationr= 0.28)
Both PM2.5 and O3 concentrations are positively associated with both all-cause and cardiovascular mortality rates.
O3 is also positively associated with non-disease mortality rates but PM2.5 is not. (All positive correlations in Table 3 are significant, but the -0.09 numbers are not.)
Population size of a county is positively associated with PM2.5 and is negatively associated with O3 and with all mortality rates.
All mortality rates (disease-related and non-disease-related) are positively associated with each other, but negatively associated with population size.
Table 10.3. Pearson correlations between pairs of exposure and response variables for elderly (75-84 year-old) people in 2010
The associations in Table 10.3 may or may not be causal, but they are not explained by coincident historical trends (since the year is held fixed at 2010) nor by confounding by age category, since the age category is also held fixed at 75-84. Whether confounding by education, income, temperature, or other variables might account for some of these associations – for example, if mortality rates and PM2.5 are both elevated on cold days or in colder regions; or if lower-income families tend to live in more polluted areas and also to have higher age-specific mortality rates irrespective of location – cannot be determined from the exposure and mortality rate data alone.
In multiple linear regression modeling of the association between explanatory variables and elderly (75-84 years-old) CVD mortality rate using automated backward stepwise variable selection via F tests, only the regression coefficient between PM2.5 and CVD mortality rate, but not O3 and CVD mortality risk, remains significant. Thus, there is a positive association between PM2.5 levels and CVD mortality rates among the elderly that is not explained by coincident historical trends, nor by confounding by age or population or O3; but the correlations between O3 and CVD mortality rates, and between O3 and all-disease mortality rate, vanish after conditioning (via multiple linear regression) on PM2.5 and population size for all disease-related mortalities. In short, PM2.5, but not O3, passes this conditional independence test for being a potential causal driver of elderly mortality rates. Similarly, for all age categories and years, PM2.5 average levels but not O3 levels help to predict CVD mortality rates.
Table 10.4. County-specific average PM2.5 concentration is significantly positively associated with county-specific CVD mortality rates across all age categories and years
Table 10.4 shows the results of a multiple linear regression with backward stepwise variable selection; results were also confirmed in multiple disjoint random samples (20% cross-validation samples). The b* column contains standardized regression coefficients (scaling each variable in terms of standard deviations) and the b column contains the unstandardized regression coefficients. As expected, Year is negatively associated with CVD mortality risk, and Age is positively associated with CVD mortality risk. Age is quantitatively by far the most important predictor of risk. PM2.5 average concentration makes the smallest, but still highly statistically significant (p < 0.000001), contribution to predicting CVD values. Population (specific to each county and age group) is also a significant predictor of CVD risk. Results for all-disease-related mortality (AC) risks are similar, with the standardized regression coefficient for PM2.5 increasing to 0.06, with the exception that both ozone (O3) and population size are significantly negatively associated with AC mortality rates (standardized regression coefficients of -0.12 for Population and -0.02 for O3). Interpretively, the coefficient for PM2.5 in Table 4 (b = 33.6) indicates that CVD mortality risk increases by 33.6 deaths per 100,000 person-years for each microgram per cubic meter increase of PM2.5 in air, assuming other variables are held constant. The mean CVD mortality rate averaged over all age categories and years is 1931.6 deaths per 100,000 person-years, so a change in PM2.5 of 10 g/m3 corresponds to a change in CVD mortality rate of approximately (10 g/m3)*(33.6 deaths per 100,000 person-years per g/m3) / (1931.6 deaths per 100,000 person-years) = 336/1931.6 = 17.4%. This slope factor could be described as a 17.4% increase in mortality per 10 g/m3increase in PM2.5 concentration.
Results on Correlations between Changes in Variables over Time Tables 10.5 and 10.6 show correlations between changes in all-cause mortality, CVD mortality, and non-disease mortality, respectively (the columns) and different possible predictors (the rows), for all counties included in the study. Table 5 presents results for the 75-84 year-old group, and Table 10.6 repeats the analysis for all age groups.
Table 10.5. Pearson correlations between changes in variables from 2000 to 2010 for elderly (75-84 year-old) people
Table 10.6. Pearson correlations between changes in variables from 2000- 2010 for all age groups
For the 75-85 year old age category, changes in all-cause (AC) and CVD mortality rates are significantly positively correlated with each other, as expected, and with changes in external-cause mortality rates. They are significantly negatively correlated with increases in population. Neither is significantly correlated with changes in PM2.5 or changes in O3. For all age groups, changes in PM2.5 are significantly but weakly positively correlated with changes in external-cause mortality rates. Changes in O3 are significantly positively correlated both with changes in AC mortality rates and with changes in CVD mortality rates. Increases in population are significantly correlated with reductions in all mortality rates.
In multivariate analysis using multiple linear regression, changes in both AC (all-cause) and CVD mortality rates are conditionally independent of changes in both PM2.5 and O3, given changes in population size, changes in external-cause mortality rates, and age in 2010. These three explanatory variables are automatically selected by backward stepwise variable selection, while changes in PM2.5 and O3 are dropped, as they provide no additional information useful for predicting the AC or CVD mortality rates. Thus, by this criterion, changes in PM2.5 and O3 levels do not help to predict or explain changes in CVD or AC mortality rates, undermining a causal interpretation of the positive associations between them in the cross-sectional analysis in Table 10.3.
Other, perhaps unexpected, correlations between changes in variables in Table 10.6 include a strong positive correlation (0.59) between changes in external-cause mortality rates and changes in CVD mortality rates; and positive correlations between baseline levels of mortality rates and changes in their levels. Thus, relatively high-risk areas in 2000 tended to become more risky by 2010. As expected, older age categories saw relatively large reductions in disease mortality rates (but increases in non-disease mortality rates).
Granger Causality Test and Control Results Granger tests using standard time series regression models with maximum lags of 1, 2, or 3 years show that, for all age categories tested (65-74, 75-84, and 85 or older) and for all mortality outcomes considered (CVD, all-disease, and external-cause mortality rates), both PM2.5 and O3 histories are not useful for predicting mortality rates in most (over 90%) of the counties. PM2.5 and O3 have predictive coefficients for CVD and all-disease mortality rates that are significantly different from zero in only a small minority of counties (7% for AC mortality, 6% for CVD mortality, and 7% for external-cause mortality, which was used as a negative control), roughly consistent with, though slightly higher than, the 5% false-positive error rate that might occur by chance due to the 5% significance level used in the tests. (For 483 counties and a true false-positive rate of 5%, there is about a 26% probability that the sample proportion of false positives would exceed 6% or be less than 4% by chance.) Perhaps more importantly, the negative control (external-cause mortalities) also shows that O3 and PM2.5 histories on time scales of several years are not Granger-causes of CVD or all disease-related deaths any more than they are of external-cause deaths. For example, the age group and lag with the highest fraction of Granger-positive associations between PM2.5 and CVD rate is the 85+ age group with a lag of one year: this fraction is 11%. But the corresponding fraction for Granger-positive associations between PM2.5 and external-cause mortalities is greater, at 14%. Thus, the Granger tests do not support a conclusion of a genuine causal effect, i.e., positive results clearly above what might occur by chance and what is found for the negative controls.
Table 10.7. Fractions of counties with positive Granger causality tests for PM2.5 and all-cause (AC), cardiovascular disease (CVD), and external-cause mortality rates, for different age groups and lags (1-3 years). Results averaged over all three lags are shown in bold.
Given the well-known limitations of p-values and significance testing, it may also be useful to consider that, if pollutant levels were detectable causal drivers of increased mortality rates at recent historical levels, then this causal relation should have been visible in a large majority of counties. The fractions in Table 10.7 might all be expected to exceed 50% in the presence of clear Granger-causality, i.e., most counties should have shown evidence of a Granger-positive association between PM2.5 and mortality rates caused by them. Intuitively, as suggested by Figure 10.1, although pollutant levels declined substantially in most counties from 2000-2010, declines in CVD and AC mortality rates did not appear to proceed more quickly when PM2.5 declined quickly than when it did not, or than when it increased. The Granger test results confirm this suggestion at the level of individual counties and for time lags of 1-3 years.
Positive Controls: Does Absence of Evidence Constitute Evidence of Absence? Might the absence of a significant association between county-specific changes in PM2.5 levels and changes in mortality rates between 2000 and 2010, shown in Tables 10.5 and 10.6 and in corresponding multiple linear regression models, be due to limited statistical power to detect changes in the presence of substantial heterogeneity and variability in the data? To check the statistical power of these methods, we modified the observed data by adding a known “signal” – a 2.6% decrease in CVD mortality rate per g/m3 decrease in PM2.5 concentration, based on the slope estimate of Lepeule et al., 2012. We then tested whether this known signal is detectable through the noise in the data using the methods we have applied.
Table 10.8 shows the results of multiple linear regression applied to the artificial data set with a simulated known causal impact of exposure. The simulated effect of changes in PM2.5 on changes in CVD mortality rates, based on the 2.6% slope coefficient for change in mortality rate per g/m3change in PM2.5) was successfully detected. (All predictors remain significant using backward stepwise variable selection.) This suggests that an effect of this size would probably have been detected in the real data if it had been present. This type of positive control gives some reassurance that the substantial variability and heterogeneity in county-level time series data would not hide causal effects of the sizes that have sometimes been estimated from standard associational (regression-based) models by assuming that slope coefficients are causal, if such causal effects were actually present.
Table 10.8. Multiple linear regression detects PM2.5 effects on mortality rates of the sizes predicted from previously published regression slope coefficients (Lepeule et al., 2012)
Finally, we briefly examined the geographic distribution of associations. Previous investigators have reported that chronic exposure to PM2.5 is associated with mortality in the eastern and central regions of the United States, but not in the western region (Zeger et al., 2008). In our data set, for the main elderly population (75-84 year-olds) in 2010, PM2.5 was statistically significantly positively correlated with CVD mortality in Florida and overall in pooled data from counties in all states. It was statistically negatively correlated with all-disease (AC) mortality rate in Arizona and statistically positively correlated with AC mortality rate in Florida and overall. Otherwise, state-specific correlations in 2010 were not individually statistically significant at the conventional 0.05, significance level, and were a mix of non-significant positive and negative correlations with no obvious geographic distribution.
Discussion and Conclusions: Caveats for Causal Interpretations of Regression Coefficients The epidemiological and risk assessment literatures on human health effects of air pollution contain dozens of studies that attribute reductions in mortality risks to reductions in air pollution levels, and that estimate the slope of the concentration-response association between exposures to pollutants and corresponding mortality rates (e.g., Pope, 1989, Clancy et al., 2002, Lepeule et al., 2012, Cesaroni et al., 2013, Fann et al., 2013, Dai et al., 2014). The work reported here contributes a new data set to this literature. It supports previous findings of positive PM2.5-mortality associations, based on PM2.5 (and O3) and age-specific mortality data, based on county-level data from the 15 largest U.S. states over the years from 2000 to 2010. Confirming earlier studies such as Lepeule et al., 2012, we found a statistically significant positive association between PM2.5 (and also O3) concentrations and both all-disease related and CVD mortality rates, as well as a significant positive association between O3 and external-cause mortalities, which we used as a negative control (Tables 10.3 and 10.4).
However, such associations between historical levels of exposure and response variables do not necessarily describe predictive or manipulative causal relations. In our examination of historical changes in pollutant levels and mortality rates (Tables 10.5 and 10.6 and multiple regression models and Granger causality tests), actual changes in PM2.5 and O3 levels over time did not significantly help to predict or explain corresponding observed changes in all-disease or CVD mortality rates over time. This argues against facile causal interpretations of the significant statistical associations between pollution levels and mortality rates. Such causal interpretations of slope coefficients are commonly made in air pollution health effects (and other) epidemiology. For example, the study of Lepeule et al., updating the important Harvard Six Cities Study, offers the important causal interpretation that “These results [i.e., that each 10 g/m3 increase in PM2.5 was associated with a 26% increase in cardiovascular mortality risk] suggest that further public policy efforts that reduce fine particulate matter air pollution are likely to have continuing public health benefits.” But, as emphasized in Chapter 2, such policy-relevant causal conclusions are unwarranted if the exposure-response association discussed is not a causal relation, and if the changes referred to are only the hypothetical ones implied by a slope coefficient, rather than actual changes in the levels of exposure and mortality time series.
Study Limitations The study and conclusions in this chapter have several limitations. Although our analysis of county-level data does not provide evidence that the roughly 30% reduction in PM2.5 levels from 2000 to 2010 (Figure 10.1) caused any detectable effect on disease-related mortality rates, it remains possible that such an effect was present that is too small to detect. For example, if each 10 µg/m3 change in PM2.5 concentration causes only a 1.03% change in CVD mortality rate, as estimated by Dai et al. (2014), then the power of our data set would not be great enough to distinguish this from zero. In addition, like many other studies, our analysis lacked individual-level exposure data. Our basic units of observation are death counts, by cause, within age categories, years, and counties; finer resolution would require a different data set. Age and death are available at the individual level, making this a semi-individual design (Künzli and Tager, 1997), rather than a purely ecological design; but other individual covariates are not available. On the other hand, the fact that we follow the same counties over multiple years contributes one of the strengths of a panel study design: the effects of fixed (or slowly changing) possible confounders or effect modifiers, such as differences in income or education or regional climate, cancel out when changes (deltas) in mortality rates are calculated for the same locations in successive years. In addition, our study substantially meets several criteria for useful ecological studies (Savitz, 2012): marked variation across geographic units (counties); unlikely confounding (due to the longitudinal panel design, in which counties serve as their own controls for purposes of subtracting out fixed effects of confounders when computing changes over time); opportunities to include negative controls (external-cause mortalities); and simulated positive controls (via simulation of postulated causal impacts).
A remaining question is, if the significant associations between PM2.5 and O3 on the one hand and CVD and all-cause mortality on the other are not due to a causal relationship between pollutant exposure and disease, then what does explain them? Our analyses have ruled out coincident trends (since the associations hold even within single years) and chance (since the correlation and regression coefficients reported are statistically significant), as well as fixed confounders (due to the panel design) as plausible explanations. Possible confounders that might co-vary with exposure levels over time, and thus offer explanations, range from co-pollutants to lagged daily temperatures (e.g., if very hot or very cold areas have higher levels of PM2.5, perhaps due in part to coal-fired power plants that power air conditioning or heating, and independently have higher mortality rates). Attaching more variables to the county-specific mortality rate and pollution level data, such as daily temperature (high and low), could potentially help to answer this question. But at present, the answer is unknown.
Finally, by focusing on changes in annual average pollutant levels and mortality rates at the individual county level, we have foregone opportunities to model or “adjust” for effects of seasonality, more granular spatial variations, and measured or latent confounders. As Dominici et al. (2014) suggest, it is not uncommon for different regression models based on different modeling choices and assumptions to produce very different answers. For example, regression coefficients that are significantly positive in one model may be significantly negative in another, depending on which variables and interaction terms are included. By using several different approaches (conditional independence tests, Granger tests, positive and negative controls, automated variable selection) as well as relatively simple measures of association (correlations and linear regression coefficients, fractions of counties with Granger-positive associations) computed using standard, widely available software for all tests), we have sought conclusions that are more robust and objective by minimizing opportunities for manual intervention to shape the results.
Comparisons to Conclusions from Other Studies The coefficient for PM2.5 in Table 10.4 (b = 33.6), corresponding to a 17% increase in mortality per 10 g/m3 increase in PM2.5 concentration, is well within the range of other recent association-based estimates based on regression relations. For example, Dai et al. (2014), in a study of 75 U.S. cities between 2000 and 2006, reported a 1.03% (95% CI: 0.65%, 1.41%) increase in CVD mortality with each 10 g/m3 increase in PM2.5, averaged over a 2-day period. In their update of the Harvard Six Cities Study, Lepeule et al. (2012) estimated a 26% (95% CI: 14%, 40%) increase in CVD mortality for each 10 g/m3 increase in PM2.5,averaged over the three prior years. Thus, our value of 17.4% falls between these two estimates, and is within the 95% CI of the Lepeule study. Some other recent studies have not detected clearly significant associations between PM2.5 levels and most CVD or all-cause mortality rates (Beelen et al., 2014), or found no association between local trends in mortality and local trends in yearly average PM2.5 after adjusting for national trends and local differences (Grevens, Dominici, and Zeger, 2011). For the U.S. county data set we have analyzed, our main conclusions are that (a) There are statistically significant associations between PM2.5 and both all-disease and CVD mortality risks; but (b) There is no clear evidence of a causal relation between PM2.5 and O3 concentration levels and mortality rates. These results differ both from studies that do not find clear associations, and also from some authoritative opinions, including views in an Expert Elicitation Study for the U.S. EPA (2006), that statistically significant exposure-response associations between PM2.5 and CVD mortality are probably causal.
While our results do not support some previous expert judgment-based assessments of causality, this is consistent with studies showing that firmly expressed opinions of key experts about air pollution health effects associations being causal (e.g., Harvard School of Public Health, 2002) have later proved to be unwarranted (e.g., HEI, 2013). The practice of applying human judgment using weight-of-evidence considerations to measures of association (such as relative risks, odds ratios, population attributable fractions, burden-of-disease estimates, and regression coefficients) to determine whether an inference of causality is supported has been widespread in epidemiology, even though some methodologists have argued that logically valid causal inferences cannot be derived from such associations in purely observational studies without interventions (Ward, 2009). This makes natural experiments, where interventions such as pollution reductions occur differently for different subpopulations, potentially valuable aids to understanding causation.
The preceding calculations illustrate that a significant positive association between historical levels of PM2.5 and historical mortality rates does not necessarily provide a sound basis for inferring a positive association between changes in levels of PM2.5 and changes in mortality rates. This methodological point confirms the importance of using quasi-experiments or other appropriate formal methods of causal study design and analysis (Table 10.2) to draw causal conclusions. Free, publicly available data sets such as the EPA and CDC data sets used in this study, and free, publicly available software such as R and Python or the CAT software from Chapter 10.2, now make it relatively easy to test whether changes in PM2.5 and O3 help to predict changes in disease mortality rates, on time scales from days to over a decade. We hope that this will encourage others to investigate further the relation between longitudinal changes in pollutant levels and changes in mortality rates, and to clarify the crucial distinction between positive statistical associations and evidence of causality in air pollution health effects epidemiology.
REFERENCES Angrist JD, Pischke J-S. Mostly Harmless Econometrics: An Empiricist’s Companion. 2009 Princeton University Press, Princeton, NJ.
Beelen R, Stafoggia M, Raaschou-Nielsen O, et al. Long-term exposure to air pollution and cardiovascular mortality: an analysis of 22 European cohorts.Epidemiology. 2014 May; 25(3):368-78.
Campbell DT, Stanley JC.Experimental and Quasi-experimental Designs for Research. Chicago: Rand McNally, 1966
Centers for Disease Control and Prevention (CDC),2014. Wonder “Compressed Mortality, 1999-2010” database. http://wonder.cdc.gov/cmf-icd10.html.
Cesaroni G, Badaloni C, Gariazzo C, Stafoggia M, Sozzi R, Davoli M, Forastiere F. (2013) Long-term exposure to urban air pollution and mortality in a cohort of more than a million adults in Rome. Environ Health Perspect. Mar;121(3):324-31.
Clancy L, Goodman P, Sinclair H, Dockery DW. 2002. Effect of air-pollution control on death rates in Dublin, Ireland: An intervention study.Lancet. Oct 19;360(9341):1210-4.
Cox LA Jr. Improving causal inference in risk analysis.Risk Analysis. 2013 Oct;33(10): 1762-71.
Cox LA Jr, Popken DA, Berman DW. Causal versus spurious spatial exposure-response associations in health risk analysis.Crit Rev Toxicol. 2013;43Suppl 1:26-38
Cox LA Jr. Popken DA, Ricci PF. Warmer is healthier: Effects on mortality rates of changes in average fine particulate matter (PM2.5) concentrations and temperatures in 100 U.S. cities. Regulatory Toxicology and Pharmacology. 2013 Aug;66(3):336-46
Cox T, Popken D, Ricci PF. Temperature, not fine particulate matter (PM2.5), is causally associated with short-term acute daily mortality rates: Results from one hundred United States cities. Dose-Response. 2012 Dec; 11(3): 319-43 Dai L, Zanobetti A, Koutrakis P, Schwartz JD. Associations of Fine Particulate Matter Species with Mortality in the United States: A Multicity Time-Series Analysis.Environ Health Perspect. 2014 May 6. [Epub ahead of print]
Dominici F., Greenstone M., Sunstein CR. Particulate matter matters. Science 344: 18 April, 2014: 257-8
The Economist (2013). Trouble at the Lab: Scientists like to think of science as self-correcting. To an alarming degree, it is not. October 19, 2013. www.economist.com/news/briefing/21588057-scientists-think-science-self-correcting-alarming-degree-it-not-trouble
U.S. Environmental Protection Agency (EPA), 2014. www.epa.gov/airdata/ad_data_daily.html
Eichler M, Didelez V. On Granger causality and the effect of interventions in time series.Lifetime Data Anal. 2010 Jan;16(1):3-32.
EPA (U.S. Environmental Protection Agency). 2011. The Benefits and Costs of the Clean Air Act from 1990 to 2020. Final Report – Rev. A. Office of Air and Radiation, Washington D.C.
EPA (2006).Expanded expert judgment assessment of the concentration-response relationship between PM2.5 exposure and mortality. www.epa.gov/ttn/ecas/regdata/Uncertainty/pm_ee_report.pdf
Fann N, Lamson AD, Anenberg SC, Wesson K, Risley D, Hubbell. Estimating the National Public Health Burden Associated with Exposure to Ambient PM2.5 and Ozone. Risk Analysis Jan. 2012 32(1):81-95
Freedman DA. Graphical models for causation, and the identification problem.Eval Rev. 2004 Aug;28(4):267-93.
Friede T, Henderson R, Kao CF. A note on testing for intervention effects on binary responses.MethodsInf Med. 2006;45(4):435-40.
Friedman MS, Powell KE, Hutwagner L, Graham LM, Teague WG. Impact of changes in transportation and commuting behaviors during the 1996 Summer Olympic Games in Atlanta on air quality and childhood asthma.JAMA. 2001 Feb 21;285(7):897-905.
Friedman, N., & Goldszmidt, M. (1998) Learning Bayesian networks with local structure. In M.I. Jordan (Ed.), Learning in Graphical Models (pp 421-459). MIT Press. Cambridge, MA.
Gilmour S, Degenhardt L, Hall W, Day C. Using intervention time series analyses to assess the effects of imperfectly identifiable natural events: a general method and example.BMC Med Res Methodol. 2006 Apr 3;6:16.
Greenland S, Brumback B. An overview of relations amongcausalmodelling methods.Int J Epidemiol. 2002 Oct;31(5):1030-7.
Greven S, Dominici F, Zeger S. AN approach to the estimation of chronic air pollution health effects using spatio-temporal information. J. American Statistical Association. June 2011, Vol. 106, No. 494: 396-406
Hack CE, Haber LT, Maier A, Shulte P, Fowler B, Lotz WG, Savage RE Jr. A Bayesian network model for biomarker-based dose response.Risk Anal. 2010 Jul;30(7):1037-51.
Harris AD, McGregor JC, Perencevich EN, Furuno JP, Zhu J, Peterson DE, Finkelstein J. The use and interpretation of quasi-experimental studies in medical informatics.J Am Med Inform Assoc. 2006 Jan-Feb;13(1):16-23.
Harvard Law Today, April 21, 2014, http://today.law.harvard.edu/improving-the-pollution-mortality-link/
Harvard School of Public Health, 2002. Press Release: “Ban on Coal Burning in Dublin Cleans the Air and Reduces Death Rates” www.hsph.harvard.edu/news/press-releases/archives/2002-releases/press10172002.html
Health Effects Institute (HEI). 2013. Did the Irish Coal Bans Improve Air Quality and Health? HEI Update, Summer, 2013. http://pubs.healtheffects.org/getfile.php?u=929. Last Retrieved 1 February 2014.