Fig. 3.12. Partial dependence plots (concentration-response curves) for the fractions of respondents reporting asthma (top panel), heart attack risk (middle panel), and stroke (bottom panel) for different levels of PM2.5, after conditioning on observed levels of other variables.
Discussion Many of the findings from the data set examined here are unsurprising in light of recent literature, but there are some exceptions. Low socioeconomic status has previously been recognized as a risk factor for both heart disease (Carlsson et al., 2016) and asthma attributed to air pollution, although, in the latter case, the need for a larger data set to examine interactions, as in Figures 3.1-3.9, has also been recognized (Burte et al., 2016). Divorce has been identified as a risk factor for heart attacks (Dupre et al., 2015), but the link between divorce (or separation) and asthma risk (Figure 3.9) has been less well recognized. The very strong association between higher income and reduced asthma risk in the BRFSS data set and the link between younger age and higher asthma risk in this data set have been commented on previously (Zahran and Bailey, 2013).
Others of our findings are more surprising. Coogan et al. (2016) report a negative relation between education and adult-onset asthma in African American women, and this contrasts with our finding of a positive association between education and asthma risk in the general BRFSS population (Figure 3.4). PM2.5 is usually significantly positively associated with heart attack risk, as might be expected based on the left side of Figure 3.7 (e.g., To et al., 2015), and is sometimes associated with asthma risk (www.epa.gov/region4/sesd/pm25/p2.html). Indeed, a logistic regression model for heart attack risk as the dependent variable with only sex, age, smoking status, and PM2.5 included as independent variables does show a significant positive association between heart attack risk and PM2.5(odds ratio of 1.09 per 10 g/m3 of PM2.5, p = 0.026). However, adding lowIncome as an independent variable removes this significant association between PM2.5 and heart attack risk disappears (OR = 1.00, [0.996, 1.01], p = 0.28). (Similarly, PM2.5 is positively associated with stroke risk (p = 0.059) when only age, sex, and smoking are included as covariates, but not when income is also included (p = 0.56).) That PM2.5 is significantly positively associated with heart attack risk when income is not included as a predictor, but not when it is included, suggests the importance of controlling for income, as well as other potential socioeconomic confounders, in assessing and interpreting exposure-response relations. As shown in Figure 3.8, both lower PM2.5 and lower prevalence of adverse health effects occur at higher income levels; thus, the associations between them must control for income as a confounder in assessing exposure-health effect relations. This has not been done in some previous studies that reported significant positive associations between PM2.5 and health risks such as heart attack or stroke hazard rates (e.g., To et al., 2015).
The finding of a significant negative association between PM2.5 and asthma risk also appears to be mostly new (but see Krstic, 2011) and,a priori,not very plausible. In a recent literature review, Heinzerling et al. (2016) reported positive associations between ultrafine particulate matter and asthma in children, but no significant effects in multivariate models that account for potential confounding by co-pollutants. Strickland et al. (2015) found a positive association between daily county-level pediatric emergency department visits for asthma or wheeze and same-day PM2.5 concentrations in Georgia, as estimated from satellite data, but without adjustment for temperature or other possible confounders. Chen et al. (2016) identify PM2.5 as one of several risk factors for both childhood and adult asthma in China. They note that China, with relatively high levels of particulate air pollution, has had lower asthma prevalence than in developed countries, but it has been increasing in recent decades. To our knowledge, a significant negative association between PM2.5 and asthma risk in regression models (and in the partial correlations in Table 3.8) after adjusting for income and other covariates is a new and unexpected finding that deserves further scrutiny in this and other large data sets.
Figure 3.10 shows that there are multiple paths of statistical dependencies leading from most other variables to asthma risk (at the bottom of the upper left panel of Figure 3.10). For example, being female is a risk factor for asthma not only via a direct dependency link between them, but also via paths involving age, education, and income, e.g., because being female is associated with lower income, especially for divorcees, and lower income, in turn is a risk factor for asthma. These multiple paths serve as a reminder that any single regression coefficient or odds ratio linking a predictor to a health risk variable is likely to conflate direct and indirect effects. Such coefficients do not necessarily show how changing one variable would change another that depends on it. However, asthma risk is not conditionally independent of any of the variables that points into it in Figure 3.10, given the values of the other variables. This leaves open the possibility that some of these arrows could represent causal relations, such that changing the values of one or more of the variables pointing into Asthmawould cause a change in asthma prevalence. To investigate this possibility further, it would be useful to examine how asthma prevalence changes when one or more of the variables pointing into it in Figure 3.10 is changed, perhaps using longitudinal data from natural experiments; this is beyond the intended scope of the present investigation, however.
Study Limitations and Uncertainties A limitation of this study is that it considers exposures and responses over a five-year window, 2008-2012, which is too brief an interval to permit study of the temporal relationship between changes in pollutant exposure levels and changes in health responses that might take place on a time scale of decades. A panel data design in which exposures and health histories for the same individual are assessed repeatedly over time, e.g., annually, for decades could add important additional information to clarify the causal interpretations (or lack of causality) for associations identified in the foregoing analyses. Bayesian Networks only show which variable are informative about each other, i.e., statistical dependence and conditional independence relationships among variables, but not the direction or magnitude of information flows between causes and effects. Inferring these from longitudinal data requires relatively lengthy time series and additional methods, such as Granger causality, transfer entropy and directed information flow (Janzing et al., 2013). When only relatively short (e.g., 5-year) time series data are available, as in this study, comparative evaluations suggest that Bayesian Network methods out-perform longitudinal methods (Zou and Feng, 2009), but it would be desirable to apply both to longer time series in future work.
A second limitation is that the variables for which data are available may not include all causally relevant factors There may be important confounders and modifying factors that are not captured in the data set analyzed here. For example, Krstić (2011) identifies apparent temperature (a measure of air temperature, relative humidity and wind speed), geographical latitude of residence and vitamin D status (reflecting exposure to sunlight) as additional relevant variables for understanding associations between asthma prevalence and PM2.5 air pollution (PM2.5) in 97 major metropolitan/micropolitan areas of the continental U.S. None of these variables has been included in the current study, suggesting that any causal interpretation of the associations reported here could be erroneous because the effects of other potentially important causes have been omitted.
More generally, the analytic methods used in this study to clarify potential causal interpretations of observed associations complement the basic idea of potential outcomes or counterfactual causal modeling models, that differences in causes make their effects differ from what they otherwise would have been, with the related idea that differences in causes help to predictdifferences in effects. Although this has the advantage that it can be tested using observational data without the need to make hypothetical modeling assumptions about what would have been if exposures or other conditionshad been different from those observed, it does not protect against the possibility that one variable might be informative about another due to the presence of unmeasured variables that affect or are affected by both. Nonparametric methods for estimating probabilistic dependency dependencies – specifically, algorithms for learning regression trees and Bayesian Networks (BNs) from data, which are now mature and widely available – also help to avoid potential biases due to model specification errors or incorrect modeling assumptions.
Conclusions The picture of asthma risk that emerges from the preceding data analyses is largely a socioeconomic one. Young divorced women with low incomes are at greatest risk of asthma, especially if they are ever-smokers or have a history of heart attacks or strokes. Income is an important confounder of other relations. For example, in logistic regression modeling, PM2.5 is positively associated with both stroke risk and heart attack risk when these are regressed only against PM2.5, sex, age, and ever-smoking status, but not when they are regressed against these variables and income. Unexpectedly, PM2.5 is significantly negatively associated with asthma risk in multiple linear regression, logistic regression, and regression tree models, with a10 g/m3 decrease in PM2.5 corresponding to about a 6% increase in the probability of asthma in a logistic regression model. Whether this negative association is explained by confounders and residual confounding, as Figure 3.10 suggests, or whether it has other explanations is a worthwhile topic for further investigation. Meanwhile, the data and analyses presented here suggest that substantially reducing the burden of adult asthma may require addressing the causal web of socioeconomic conditions leading to low incomes, smoking, and divorce, especially among women.
REFERENCES FOR CHAPTER 3
Aliferis CE, Statnikov A, Tsamardinos I, Mani S, Koutsoukos XS (2010). Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part I: Algorithms and Empirical Evaluation. Journal of Machine Learning Research 11 (2010) 171-234
Burte E, Nadif R, Jacquemin B. Susceptibility Factors Relevant for the Association Between Long-Term Air Pollution Exposure and IncidentAsthma.Curr Environ Health Rep. 2016 Jan 28.
Browne CB, Powley E, Whitehouse D, Lucas SM, Cowling PI, Rohlfshagen, Taverner S, Pere D, Samothrakis S, Colton S.(2012). A survey of Monte Carlo Tree Search methods.IEEE Transactions on Computational Intelligence and AI in Games. Mar 4(1): 1-43. Carlsson AC, Li X, Holzmann MJ, Wändell P, Gasevic D, Sundquist J, Sundquist K. Neighbourhood socioeconomic status and coronary heart disease in individuals between 40 and 50 years.Heart. 2016 Feb 10.
Coogan PF, Castro-Webb N, Yu J, O'Connor GT, Palmer JR, Rosenberg L. Neighborhood and Individual Socioeconomic Status andAsthmaIncidence in African American Women.Ethn Dis. 2016 Jan 21;26(1):113-22.
Dupre ME, George LK, Liu G, Peterson ED. Association between divorce and risks for acute myocardial infarction. Circ Cardiovasc Qual Outcomes. 2015 May;8(3):244-51.
Frey L, Fisher D, Tsamardinos I, Aliferis CF, Statnikov A, (2003). Identifying Markov Blankets with Decision Tree Induction. Proceedings of the Third IEEE International Conference on Data Mining, Melbourne, FL Nov 19-22 2003. pp 59-66
Furqan MS, Siyal MY. Random forestGrangercausalityfor detection of effective brain connectivity using high-dimensional data.J IntegrNeurosci. 2016 Mar;15(1):55-66.
Guo Z, Haimes YY. Risk Assessment of Infrastructure System of Systems with Precursor Analysis. Risk Anal. 2016 Aug;36(8):1630-43. doi: 10.1111/risa.12559.
Halliday DM, Senik MH, Stevenson CW, Mason R. Non-parametric directionality analysis - Extension for removal of a single common predictor and application to time series.J Neurosci Methods. 2016 Aug 1;268:87-97.
Heinzerling A, Hsu J, Yip F. Respiratory Health Effects of Ultrafine Particles in Children: A Literature Review.Water Air Soil Pollut. 2016 Jan;227.
Hill J. 2016 Atlantic Causal Inference Conference Competition: Is Your SATT Where It's At? http://jenniferhill7.wixsite.com/acic-2016/competition
Janzing D, Balduzzi D, Grosse-Wentrup M, Scholkopf B. Quantifying Causal Influences. The Annals of Statistics 2013, Vol. 41, No. 5, 2324–2358
Krstić G. Asthma prevalence associated with geographical latitude and regional insolation in the United States of America and Australia. PLoS ONE 2011;6(4): e18492. doi:10.1371/journal.pone.0018492
NIPS (Neural Information Processing Society) 2013 Workshop on Causality. http://clopinet.com/isabelle/Projects/NIPS2013/
Oliveira A, Barros H, Maciel MJ, Lopes C. Tobacco smoking and acute myocardial infarction in young adults: a population-based case-control study.Prev Med. 2007 Apr;44(4):311-6. Epub 2007 Jan 17.
Pearl J. An introduction to causal inference.Int J Biostat. 2010 Feb 26;6(2):Article 7.
Shah RS, Cole JW. Smoking and stroke: the more you smoke the more you stroke.Expert Rev Cardiovasc Ther. 2010 Jul;8(7):917-32.
Strickland MJ, Hao H, Hu X, Chang HH, Darrow LA, Liu Y. Pediatric Emergency Visits and Short-Term Changes in PM2.5Concentrations in the U.S. State of Georgia.Environ Health Perspect. 2015 Oct 9
To T, Zhu J, Villeneuve PJ, Simatovic J, Feldman L, Gao C, Williams D, Chen H, Weichenthal S, Wall C, Miller AB.Chronic disease prevalence in women and air pollution--A 30-year longitudinal cohort study.Environ Int. 2015 Jul;80:26-32.
Zahran HS, Bailey C. Factors associated withasthmaprevalence among racial and ethnic groups--United States, 2009-2010behavioral risk factor surveillance system.J Asthma. 2013 Aug;50(6):583-9.
Descriptive Analytics For Occupational Health: Is Benzene Metabolism in Exposed Workers More Efficient at Very Low Concentrations? INTRODUCTION The occupational risks to workers from noxious substances inhaled in air depend on the concentrations inhaled and on what happen to the inhaled substances – for example, whether they are swiftly detoxified and eliminated from the body without doing harm, or whether they are metabolized to form toxic concentrations of metabolites in target tissues. Descriptive analytics applied to data on inhaled concentrations and metabolites formed can be used to clarify how efficiently the body produces toxic metabolites at low exposre concentrations. This chapter applies descriptive analytics methods introduced in Chapters 1-3, including interaction plots, nonparametric regression, CART trees, and Bayesian networks, to data on benzene metabolities in Chinese factory workers in an effort to resolve a recent puzzle in the literature on low dose benezene toxicology. For readers who do not care to pursue this topic further, we recommend quickly examining the figures to see how plots and visualizations of patterns in the data can be displayed and used to gain insight ino the dependencies among variables,
Two apparently contradictory findings in the literature on low-dose human metabolism of benzene are that (1) Metabolism is approximately linear at low concentrations, e.g., below 10 ppm, consistent with decades of quantitative modeling of benzene pharmacokinetics and dose-dependent metabolism; and (2) Measured benzene exposure and metabolite concentrations for occupationally exposed benzene workers in Tianjin, China in a set of recent studies show that dose-specific metabolism (DSM) ratios of metabolite concentrations per ppm of benzene in air decrease steadily with benzene concentration, with the steepest decreases below 3 ppm. This has been interpreted as indicating that metabolism at low concentrations of benzene is highly nonlinear. This chapter reexamines the data using non-parametric methods of descriptive analytics and concludes that both findings are correct. They are not contradictory. Low-concentration metabolism can be linear, with metabolite concentrations proportional to benzene concentrations in air, and yet DSM ratios can still decrease with benzene concentrations. The algebra of random variables allows a ratio to be negatively correlated with it denominator even when the mean of the numerator is proportional to the denominator. Toxicological interpretations of declining DSM ratios as evidence of nonlinear metabolism are unwarranted when direct plots of metabolite concentrations against benzene ppm in air show approximately straight-line relationships between them, as in the Tianjin data. Thus, relatively straightforward descriptive analytics can help to resolve what at first appears to be contradiction that has fueled heated discussions in the recent literature. Descriptive plots and analysis reveal that highly nonlinear, decreasing DSM ratios are consistent with linear metabolism.
Since the 1970s, it has been recognized that occupational inhalation exposures to hundreds of ppm of benzene for decades increase risks of acute myeloid leukemia (AML). For example, among Turkish shoe workers exposed prior to 1970, the increase in AML risk was estimated as roughly 2- to 4-fold, or about 1 excess case per 10,000 to 100,000 worker-lifetimes (Aksoy et al, 1974). More recently, Chinese workers occupationally exposed to benzene have been reported to have levels of benzene metabolites including phenol (PH), hydroquinone (HQ), catechol (CA), E,E-muconic acid (MA), and S-phenylmercapturic acid (SPMA), as well as levels of unmetabolized benzene excreted in urine, that are disproportionately elevated at very low estimated exposure concentrations such as 1 part per million (ppm) or less (Rappaport et al., 2010). Such findings naturally prompt curiosity and concern about exactly how great the increases in benzene metabolites are at such occupational exposure levels and whether they might be sufficient to create significantly increased health risks. The main purpose of this chapter is to apply non-parametric statistical methods to Chinese worker data from three factories in Tianjin to ascertain the relationship between relatively low levels of benzene exposure (< 10 ppm) and resulting increases in benzene metabolites in workers exposed to these concentrations.
The literature on low-dose benzene metabolism and health effects has been marked by conspicuously opposing views, sometimes vehemently expressed (Rappaport et al., 2013; Price et al., 2013). Both regulation and litigation have drawn on and helped to fund scientific investigations of low-exposure benzene dose-response relationships in recent decades (Schirrmeister and Flora, 2008). Investigators funded by regulators such as the U.S. Environmental Protection Agency (EPA) and the Occupational Safety and Health Administration (OSHA), or by government institutes such as the National Institute for Environmental Health Sciences (NIEHS), the National Cancer Institute (NCI), and the National Institute for Occupational Safety and Health (NIOSH), and testifying on behalf of plaintiffs, often descry evidence of low-dose hazards where investigators funded by industry groups such as the American Petroleum Institute in the U.S. or CONCAWE in Europe, including the current author, do not. Published results are sensitive to modeling assumptions and to data interpretations of unknown validity, as discussed later. As a result, non-parametric methods that make minimal or no modeling assumptions may be especially valuable for benzene to help rise above motivated reasoning and politicized science and interpretation of data (Thomas, 2014). The following sections therefore apply non-parametric methods and data-visualization to benzene data.
The remainder of this chapter is organized as follows. The next section provides background on theories and interpretations of low-exposure concentration toxicological and epidemiological data that have been used to argue for or against the hypothesis that benzene concentrations (ppm) in the range of single digits or less are harmful to human health. A data set on benzene and metabolites in workers from Tianjin, China is then analyzed to reexamine the low-concentration relationship between benzene exposure concentrations and metabolite levels using non-parametric methods.
Background: Theories and Controversies in Benzene Dose-Response Different toxicological theories have been proposed for the potential low-dose effects of benzene on leukemia risks, with some supporting and others undermining the hypothesis that excess risks should be expected at exposure concentrations in the range of a few ppm or less. For over 30 years, it has been recognized that peripheral blood white blood cells, such as lymphocytes and mononuclear cells, are much more sensitive to benzene than are red blood cells and other cells in the myeloid lineage (e.g., Kipen et al., 1989). More recently, a team of researchers at the University of California at Berkeley applied modern biological research methods to benzene-exposed workers in China, reporting altered gene expression and enzyme activity in peripheral white blood cells at benzene concentrations as low as 0.1 ppm (McHale et al., 2011; Thomas et al., 2014). Somewhat confusingly, clusters of gene expression changes in peripheral blood mononuclear cells associated with benzene exposures have been termed “the acute myeloid leukemia (AML) pathway” (ibid), although they involve neither leukemia nor myeloid cells nor a pathway that connects them to AML. The biological relevance of these findings to AML is unknown, as they occur in peripheral blood cells of the lymphoid lineage rather than in the multipotent stem cells and myeloid precursor cells in the bone marrow that are thought to be target cells for AML (Walter et al., 2012).
Nonetheless, the possibility that relatively low concentrations of benzene that cause altered gene expression in peripheral white blood cells might also increase risk of AML and other leukemias has remained a topic of active research for the past decade (ibid). The possibility of hormesis – that relatively low levels of benzene exposure might decrease risk of AML – has been less investigated, but exposures to concentrations of less than 10 ppm of benzene have been reported to significantly reduce the clonal proliferation (colony formation potential) of myeloid progenitor cells isolated from peripheral blood (Lan et al., 2004), consistent with some theoretical models of hormesis (Cox, 2006, 2009).
Other investigators have emphasized the absence of detected increases in leukemia risks at relatively low levels of benzene exposure. For example, Wong (1995) reexamined a much-studied occupational cohort in the U.S. (the Pliofilm cohort) that had often been interpreted as showing no evidence of an exposure threshold for increased leukemia risk as a function of benzene exposure and concluded that, to the contrary, “No increased risk of AML was detected for cumulative exposure to benzene below 200 ppm-years (SMR 0.91). Above 200 ppm-years, risk of AML rose drastically; reaching a significant SMR of 98.37 for > 400 ppm-years.” Several investigators, including the International Agency for Research on Cancer (IARC) have suggested that occupational exposures to benzene might cause increased risk of multiple myeloma, acute lymphocytic leukemia, and chronic lymphocytic leukemia, based largely on interpreting causally associations in epidemiological studies that have been described as inconclusive (Vlaanderen et al., 2012). Weed (2010) critiqued the use of meta-analysis of associations to suggest that benzene might also cause non-Hodgkin’s lymphoma (Steinmaus et al., 2008), arguing that “Causal claims… should not emerge from meta-analyses as such” and that University of California, Berkeley investigators had “performed a meta-analysis and concluded that it represented new evidence that benzene causes NHL” in spite of “a lack of consistency (i.e., significant heterogeneity), weak associations, no evidence of dose-response, no effort to provide an assessment of biological plausibility, and no new epidemiological evidence.” He used this as a case study for critical discussion of the use and misuse of meta-analysis and causal inference in occupational epidemiology. In a reply defending their interpretation of meta-analysis of correlations as evidence of causation, Steinmaus et al. (2011) respond that “We have been teaching this for many years in our course at the University of California, Berkeley, School of Public Health, titled ‘Causal inference and meta-analysis’” and indeed much of the literature postulating or asserting increased health risks at relatively low levels of benzene concentration in the past two decades has flowed from this same research group at the University of California, Berkeley, School of Public Health (Steinmaus et al., 2008 and 2011; Lan et al., 2004; Rappaport et al., 2009, 2010, 2013; McHale et al., 2011; Thomas et al., 2014).
Turning from possible health effects to possible causes, Rappaport et al. (2010) postulated a model with “a hitherto unrecognized high-affinity enzyme that was responsible for an estimated 73 percent of total urinary metabolites [sum of phenol (PH), hydroquinone (HQ), catechol (CA), E,E-muconic acid (MA), and S-phenylmercapturic acid (SPMA)] in nonsmoking females exposed to benzene at sub-saturating (ppb) air concentrations.” If true, this hypothesis would imply that an unidentified enzyme, which the authors estimated to be responsible for more than half of all benzene metabolism at 1 ppm and close to 20% at 10 ppm, had been overlooked during the past half century of research and quantitative modeling of benzene metabolism, including during the development of validated physiologically-based pharmacokinetic (PBPK) models of benzene metabolism in humans and rodents that have successfully fit and predicted observed data without postulating any such unidentified enzyme (e.g., Knutsen et al., 2013). The evidence offered to support the hypothesis consists of fitting two parametric regression curves to data, one to represent a one-enzyme model and the other to represent a two-enzyme model (Rappaport et al., 2010). The authors did not claim that either model had been validated by showing that it correctly described benzene metabolism in humans, nor did either model represent measurement errors in its right-hand side variables (estimated concentration of benzene in air and estimated background levels of each metabolite). Thus, although the authors interpreted the results of this curve-fitting exercise as providing “strong statistical evidence favoring two metabolizing enzymes and indicated that the higher-affinity enzyme was responsible for about 73% of all benzene metabolism at non-saturating (ppb) air concentrations,” the evidence only shows that one curve-fitting model provides a worse fit to the data than the other. This does not imply that either of them is realistic or correct. It does not justify a conclusion that the comparison provides “extremely strong evidence favoring the better model as a depiction of the true metabolism of benzene to a particular metabolite,” or that the results “provide extremely strong statistical evidence that benzene is metabolized to PH and MA via two enzymes rather than one enzyme, and that the putative high-affinity enzyme is active primarily below 1 ppm” (Rappaport et al., 2010).
Earlier work by the same group had concluded that benzene oxide-albumin adducts from Chinese workers “indicate that deviations from linear metabolism began at or below benzene exposures of 10 ppm and that pronounced saturation was apparent at 40-50 ppm” (Rappaport et al., 2002), and that “Mean trends of dose-specific levels (micromol/L/ppm benzene) of E,E-muconic acid, phenol, hydroquinone, and catechol all decreased with increasing benzene exposure, with an overall 9-fold reduction of total metabolites. …[indicating] that benzene metabolism is highly nonlinear with increasing benzene exposure above 0.03 ppm, and that current human toxicokinetic models do not accurately predict benzene metabolism below 3 ppm” (Kim et al., 2006b). Similarly, summarizing these findings, Rappaport et al. (2009) note that “Intriguingly, the exposure-specific production of major metabolites (phenol, muconic acid, hydroquinone, and catechol, in micromolar per parts per million benzene) decreased continuously with estimated exposure levels over the range of 0.03–88.9 ppm, with the most pronounced decreases occurring at benzene concentrations < 1 ppm.”
These and related articles by the Berkeley group suggest that low levels of exposure to benzene are disproportionately hazardous compared to higher levels. As expressed by Rappaport et al., 2009, “Because regulatory risk assessments have assumed nonsaturating metabolism of benzene in persons exposed to air concentrations well above 10 ppm, our findings suggest that the true leukemia risks could be substantially greater than currently thought at ambient levels of exposure – about 3-fold higher among nonsmoking females in the general population.” This line of reasoning, which projects leukemia risks directly from modeled levels of metabolites, has proved influential with regulators: “In justifying its decision to lower the benzene content of gasoline, the U.S. EPA cited studies pointing to supralinear (greater-than-proportional) production of benzene-related protein adducts at air concentrations < 1 ppm (Rappaport et al. 2002, 2005). … Because the U.S. EPA had previously assumed that human benzene metabolism proceeded according to nonsaturating (first-order) kinetics at exposure concentrations well above 10 ppm, saturation of metabolism below 1 ppm ‘could lead to substantial underestimation of leukemia risks’ in the general population (U.S. EPA 2007).” (ibid)
Against this interpretation, Price et al. (2012) reexamined the Tianjin data and the modeling by Kim et al. (2006) and concluded that based on “the impacts of technical issues in the corrections for background levels of metabolites, accounting for biases in the regression modeling, and the uncertainties introduced by the use of a calibration model to estimate benzene air levels for certain workers are evaluated and …the Tianjin data appear to be too uncertain to support any conclusions of a change in the efficiency of benzene metabolism with variations in exposure.” In particular, Price et al. highlighted that “Defining background levels as either the levels in all workers with no occupational exposures or in workers with predicted air levels of < 0.03 p.p.m. results in estimates of 2.4 fold [< 0.1–15] and 3.3 fold [< 0.1–19] increases, respectively.” In other words, these analyses found no significant departure from linear metabolism at low exposure concentrations, since all confidence intervals include 1.
Rappaport et al. (2013) responded vigorously. They confirmed that these alternative definitions of background levels of exposure do indeed lead to DSM ratios of metabolite concentrations in urine to benzene concentration in air that no longer decrease (and even increase) with inhaled benzene concentrations below about 0.1 ppm (ibid, Figure 4.5, panels B and C), but construed this finding as unhelpful. They acknowledged that “Indeed, our analyses indicate that [these redefinitions] effectively precluded any attempt at elucidating DSM of benzene in the range of 0.03 p.p.m,” similar to the Price et al. conclusion that “The new analysis indicates that findings of increased production are… highly uncertain.” Thus, both sets of authors appear to agree that there was no evidence of decreasing DSM ratios at low levels of benzene exposure if all workers without occupational exposures to benzene are taken as the control group, but DSM ratios do decrease with benzene concentration if the control group is defined as the portion of the workers with no occupational exposure to benzene and with the 60 lowest levels of urinary benzene. Measured values of benzene exposures were missing for all concentrations below 0.2 ppm, the lower detection limit, making the definition of “control group” and assumptions about the levels of benzene to which they were exposed decisive for deriving or refuting the conclusion that DSM ratios decrease with benzene exposure concentrations at these low levels. A further exchange of correspondence between Price and Rappaport et al. debated the issues and the propriety of ad hominem comments further, but did not change the conclusions on either side (Price et al., 2013; Rappaport et al., 2013).
With this somewhat contentious background, and acknowledging funding from CONCAWE, a European organization representing petroleum companies, the following sections undertake a fresh examination of how benzene metabolites vary with benzene concentrations at low concentrations (3 ppm or less) of benzene in air using data from the Tianjin, China factory workers. In principle, this might seem a relatively simple matter to resolve, as both benzene levels below 1 ppm and corresponding metabolite concentrations have been measured in these workers, allowing direct inspection of the relationships between them. That is the approach emphasized in the following sections. But considerable debate about benzene toxicology and interpretation of data at low doses appears to have been caused by use of alternative modeling assumptions and interpretations of data, as just described. The following sections therefore follow the constructive advice of Thomas et al. (2014), that “The use of non-parametric approaches is particularly relevant here and in epidemiological studies in general because it is impossible to know the exact functional relationships among the variables such as gene expression, dose from exposure, age, gender and smoking status of the subject, cell counts etc. Non-parametric approaches make minimal assumptions about these functional relationships and let the observed data guide the choice of the best models using rigorous statistical criteria (e.g., cross-validation). The implication of making parametric assumptions is that if these assumptions are untrue (which is almost certainly the case), the results produced can be difficult to interpret.” Accordingly, we will apply non-parametric methods to the Tianjin data, while also commenting on the extent to which the data are consistent with a linear (parametric) relationship between benzene exposure concentrations and metabolite concentrations at low doses.