Batz, M B, Hoffmann, S & Morris, JG, Jr. 2012. Ranking the disease burden of 14 pathogens in food sources in the United States using attribution data from outbreak investigations and expert elicitation. J Food Prot, 75(7), 1278-1291.
Cook, AJC, Holliman, R, Gilbert, RE, Buffolano, W, Zufferey, J, Petersen, E, Jenum, PA, Foulon, W, Semprini, AE & Dunn, DT. 2000. Sources of toxoplasma infection in pregnant women: European multicentre case-control study. Commentary: Congenital toxoplasmosis—further thought for food. BMJ, 321(7254), 142-147.
Cox, LA, Jr. & Popken, DA. 2014. Quantitative risk assessment of human MRSA risks from swine. Risk Analysis (forthcoming)
Cox, LA, Jr. & Popken, DA. 2010. Assessing potential human health hazards and benefits from subtherapeutic antibiotics in the United States: tetracyclines as a case study. Risk Analysis, 30(3), 432-457.
Cox, LA, Jr., Popken, DA, & Mathers, J. 2009. Human health risk assessment of penicillin/aminopenicillin resistance in enterococci due to penicillin use in food animals. Risk Analysis, 29(6), 796-805.
Cressey, P & Lake, R. 2005. Ranking food safety risks: Development of NZFSA policy 2004-2005. Institute of Environmental Science and Research (online report). Available at http://www.foodsafety.govt.nz/elibrary/industry/Ranking_Food_Safety_Risks-Science_Research.pdf . Accessed 2/13/2013.
Davies, PR. 2011. Intensive swine production and pork safety. Foodborne Pathog Dis, 8(2), 189-201.
Davies, PR, Morrow, WEM, Gamble, HR, Deen, J & Patton, S. (1998). Prevalence of antibodies to Toxoplasma gondii and Trichinella spiralis in finishing swine raised in different production systems in North Carolina, USA. Prev. Vet. Med., 36 (1), 67-76.
Davis, R. 2008. Teaching Project Simulation in Excel Using PERT-Beta Distributions. INFORMS Transactions on Education, 8(3), 139-148.
Dubey, JP. 2009. Toxoplasmosis in pigs—The last 20 years. Veterinary Parasitology, 164(2–4), 89-103.
Dubey, JP. 2013. "Swine Toxoplasmosis." Veterinary Division - Animal Health Programs (website). Available at http://www.ncagr.gov/vet/FactSheets/Toxoplasmosis.htm. Accessed 2/25/2013.
Dubey, JP, Gamble, HR, Hill, D, Sreekumar, C, Romand, S & Thuilliez, P. 2002. High prevalence of viable Toxoplasma gondii infection in market weight pigs from a farm in Massachusetts. J. Parasitol. 88, 1234–1238.
Dubey, JP, Hill, DE, Rozeboom, DW, Rajendran, C, Choudhary, S, Ferreira, LR, Kwok, OCH. & Su, C. 2012. High prevalence and genotypes of Toxoplasma gondii isolated from organic pigs in northern USA. Veterinary Parasitology, 188(1–2), 14-18.
Dubey, JP, Hill, DE, Sundar, N, Velmurugan, GV, Bandini, LA, Kwok, OCH, Pierce, V, Kelly, K, Dulin, M, Thulliez, P, Iwueke, C & Su, C. 2008. Endemic toxoplasmosis in pigs on a farm in Maryland: isolation and genetic characterization of Toxoplasma gondii. J. Parasitol. 94, 36–41.
Dubey, JP, Leighty, JC, Beal, VC, Anderson, WR, Andrews, CD & Thulliez, P. 1991. National seroprevalence of Toxoplasma gondii in pigs. J Parasitol, 77(4), 517-521.
Gebreyes, WA, Bahnson, PB, Funk, JA, McKean, J & Patchanee, P. 2008. Seroprevalence of Trichinella, Toxoplasma, and Salmonella in antimicrobial-free and conventional swine production systems. Foodborne Pathog Dis, 5(2), 199-203.
Guerina, NG, Hsu, H-W, Meissner, HC, Maguire, JH, Lynfield, R, Stechenberg, B, Abroms, I, Pasternack, MS, Hoff, R, Eaton, RB & Grady, GF. 1994. Neonatal Serologic Screening and Early Treatment for Congenital Toxoplasma gondii Infection. New England Journal of Medicine, 330(26), 1858-1863.
Haas C.N., J.B. Rose, and C.P. Gerba. 1999. Quantitative Microbial Risk Assessment, John Wiley & Sons, New York.
Hamilton, B. & Sutton, PD. 2012. Recent Trends in Births and Fertility Rates Through June 2012 (website). Available at http://www.cdc.gov/nchs/data/hestat/births_fertility_june_2012/births_june_2012.pdf. Accessed 1/3/2013.
Havelaar, AH, Galindo, AV, Kurowicka, D & Cooke, RM. 2008. Attribution of foodborne pathogens using structured expert elicitation. Foodborne Pathogens and Disease, 5(5), 649+.
Hays, SM. 1996. The Cat/Pig Toxoplasmosis Connection. Agricultural Research, 44(2), 8-9.
Hill, DE, Baroch, J, Swafford, SR, Fournet, VM, Pyburn, DG, Schmit, B.B., Gamble, HR, Feidas, H & Theodoropoulos, G. In press. Surveillance of feral swine for Trichinella spp. and Toxoplasma gondii in the US and host-related factors associated with infection. J. Wildl. Dis. Hill, DE, Haley, C, Wagner, B, Gamble, HR & Dubey, JP. 2010. Seroprevalence of and Risk Factors for Toxoplasma gondii in the US Swine Herd Using Sera Collected During the National Animal Health Monitoring Survey (Swine 2006). Zoonoses and Public Health, 57(1), 53-59.
Hoffmann, S, Batz, MB & Morris, JG, Jr. 2012. Annual cost of illness and quality-adjusted life year losses in the United States due to 14 foodborne pathogens. J Food Prot, 75(7), 1292-1302.
Jolie, R, Backstrom, L, Pinckney, R, and Olson, L. Ascarid infection and respiratory health in feeder pigs raised on pasture or in confinement. Swine Health and Production, 6(3), 115-120.
Jones, JL, Kruszon-Moran, D, Sanders-Lewis, K & Wilson, M. 2007. Toxoplasma gondii Infection in the United States, 1999–2004, Decline from the Prior Decade. The American Journal of Tropical Medicine and Hygiene, 77(3), 405-410.
Jones, JL, Kruszon-Moran, D, Wilson, M, McQuillan, G, Navin, T & McAuley, JB. 2001. Toxoplasma gondii Infection in the United States: Seroprevalence and Risk Factors. American Journal of Epidemiology, 154(4), 357-365.
Kijlstra, A, Meerburg, BB, & Bos, P. 2009. Food safety in free-range and organic livestock systems: risk management and responsibility. Journal of Food Production, 72(12), 2629-2637.
Kijlstra, A, Eissen, OA, Cornelissen, J, Munniksma, K, Eijck, I & Kortbeek, T. 2004. Toxoplasma gondii infection in animal-friendly pig production systems. Invest Ophthalmol Vis Sci, 45(9), 3165-3169.
McKean, J, O'Conner, A, Pyburn, D & Beary, J. 2009. Survey of market swine to determine prevalence of Toxoplasma in meat juice samples from selected abattoirs. In 8th International Symposium: Epidemiology and Control of Foodborne Pathogens in Pork, Quebec City, Canada.
Mead, PS, Slutsker, L, Dietz, V, McCaig, LF, Bresee, JS, Shapiro, C, Griffin, PM & Tauxe, R. V. 1999. Food-related illness and death in the United States. Emerg Infect Dis, 5(5), 607-625.
O'Brien, A. M., Hanson, B. M., Farina, S. A., Wu, J. Y., Simmering, J. E., Wardyn, S. E., Forshey, B. M., Kulick, M. E., Wallinga, D. B. & Smith, T. C. (2012). MRSA in Conventional and Alternative Retail Pork Products. PLoS One, 7(1), e30092.
Patton, S, Faulkner, C, Anderson, A,Smedley, K & Buch, E. 2002. Toxoplasma gondii Infection in sows and market-weight pigs in the United States and it potential impact on consumer demand for pork. National Pork Board Report NPB# 00-130 (online report). Available at http://www.pork.org/FileLibrary/ResearchDocuments/00-130%20-PATTONUofTenn.pdf. Accessed 2/18/2014.
Patton, S, Zimmerman, J, Roberts, T, Faulkner, C, Diderrich, V, Assadi-Rad, A, Davies, P & Kliebenstein, J. 1996. Seroprevalence of Toxoplasma gondii in hogs in the National Animal Health Monitoring System (NAHMS). J Eukaryot Microbiol, 43(5), 121S.
Roepstorff, A, Mejer, H, Nejsum, P, & Thamsborg, S. 2011. Helminth parasites in pigs: new challenges in pig production and current research highlights. Vet Parasitol, 180(1-2), 72-81.
Scallan, E, Hoekstra, RM, Angulo, FJ, Tauxe, RV, Widdowson, MA, Roy, SL, Jones, JL & Griffin, PM. 2011. Foodborne illness acquired in the United States--major pathogens. Emerg Infect Dis, 17(1), 7-15. Also available at: http://wwwnc.cdc.gov/eid/article/17/1/p1-1101_article.htm. Accessed 2/26/2014.
Tenter, A, Heckeroth, A, & Weiss, L. 2000 Toxiplasma gondii: from animals to humans. Int J Parasitol, 30(12-13), 1217-1258.
M. Kate Thomas, Regan Murray, Logan Flockhart, Katarina Pintar, Frank Pollari, Aamir Fazil, Andrea Nesbitt, and Barbara Marshall. Foodborne Pathogens and Disease, 10(7), 639-648.
USDA-APHIS-VS-CEAH. 2011. Seroprevalence of Trichinella and Toxoplasma in US Grower/Finisher Pigs, 2006. USDA-APHIS (online report). Available at http://www.aphis.usda.gov/animal_health/nahms/swine/downloads/swine2006/Swine2006_is_trich.pdf. Accessed 1/31/2013.
USDA-APHIS. 2008. Biosecurity on U.S. Swine Sites. USDA-APHIS (online report). Available at http://www.aphis.usda.gov/animal_health/nahms/swine/downloads/swine2006/Swine2006_is_biosecurity.pdf. Accessed 1/2/2013.
USDA-ERS. 2012. Table 2. U.S. certified organic farmland acreage, livestock numbers, and farm operations. US Dept of Agriculture, Economic Research Service (online file). Available at http://www.ers.usda.gov/datafiles/Organic_Production/National_Tables_/Farmlandlivestockandfarm.xls. Accessed 2/23/2014.
USDA-NASS. 2012. Meat Animals Production, Disposition, and Income: 2011 Summary. USDA-NASS (online database). Available at http://www.ers.usda.gov/data-products/agricultural-baseline-database.aspx. Accessed 2/22/2014.
Vaillant, V, de Valk, H, Ancelle, T, Colin, P, MC, D, Dufour, B, Pouillot, R., Le Strat, Y, Weinbreck, P., Jougla, E. & Desenclos, J. C. 2005. Foodborne Infections in France. Foodborne Pathog Dis, 2(3), 221-232.
van der Giessen, J, Fonville, M, Bouwknegt, M, Langelaar, M & Vollema, A. 2007. Seroprevalence of Trichinella spiralis and Toxoplasma gondii in pigs from different housing systems in The Netherlands. Veterinary Parasitology, 148(3–4), 371-374.
Weese, J, Zwambag, A, Rosendal, T, Reid-Smith, R, & Friendship, R. 2011. Longitudinal investigation of methicillin-resistant Staphylococcus aureus in piglets. Zoonoses Public Health, 58(4), 238-43.
Table 7.1. Distribution and Parameter Summary
Prevalence in Total Confinement Swine
Published surveys (see Table 1 text)
0, .003, .027
(Min, Mean, Max)
Prevalence in Open/Free Range Swine
Published surveys (see Table 1 text)
.027, .244, .901
(Min, Mean, Max)
Covers known estimates (see text) and assuming high uncertainty.
0.30, 0.50, 0.70
(Min, Mean, Max)
Proportion from pork
(Batz et al., 2012)
(Mean, Std. Dev)
Human Baseline Prevalence
Estimated prevalence for persons aged 40-49 years (Jones et al., 2007) was assumed to be the cumulative result of 45 years of constant incidence. Upper/lower limits correspond to published 95% CI.
0.1137, 0.157, 0.177
(Min, Mode, Max)
1-(1-Prevalence)^(1/45) applied to non-infant fraction (.9872 per 2010 composition) of 2013 population (approx. 315M) = 310.98M
0.00268, 0.00379, 0.00432
(Min, Mode, Max)
The symptomatic faction was estimated at 15%. Uncertainty was based on a 50% relative increase/decrease from .15 on a log odds scale.
Estimates based on Scallan et al (2011) are shown in the center delineated block of rows. Modifications to their values are shaded grey.Table 2. Beta Distribution Parameters for Average T. gondii Antibody Prevalence in Pigs
Table 4. Estimates of Current Pork-Attributable Human Toxoplasmosis Health Outcomes Per Year
Total Cases (3)
Congenital Cases (6)
Total QALYs (7)
Table 5. Excess QALYs Lost (Lower, Mean, and Upper Estimates) Per Year Versus Production Shift Fraction
Shift Fraction ΔC
Mean QALYs Lost/Yr
5% Confidence Level
95% Confidence Level
Figure 2. Frequency Distributions for T. gondiiPrevalence in Confined Versus Open/Free Range Systems
Figure 1. Conceptual Diagram of the Toxoplasmosis Simulation Model.
Figure 3. Distribution of Current Total QALYs Lost per Year Due to Pork Attributable T. gondii
Figure 4. Distributions of the Excess Risk Factor (left panel) and QALYs Lost per Year (right panel) at 0.001 Production Shift
How Well Can High-Throughput Screening Test Results Predict Whether Chemicals CauseCancer in Mice and Rats? Introduction Over the past half century, an enduring intellectual and technical challenge for risk analysts, statisticians, toxicologists, and experts in artificial intelligence, machine-learning and bioinformatics has been to predict in vivo biological responses to realistic exposures, with demonstrably useful accuracy and confidence, from in vitro and chemical structure data. The common goal of many applied research efforts has been to devise and validate algorithms that give trustworthy predictions of whether and by how much realistic exposures to chemicals change probabilities of adverse health responses. This chapter examines recent, promising results suggesting that high-throughput screening (HTS) assay data can be used to predict in vivo classifications of rodent carcinogenicity for certain pesticides. Anticipating the focus on evaluation analytics for assessing the performance of systems, policies, and interventions in Chapters 9 and 10, it also undertakes an independent reanalysis of the underlying data to determine how well this encouraging claim can be replicated and supported when the same data are analyzed using slightly different methods.
In principle, every student of statistics or bioinformatics with access to relevant data is equipped to participate in the challenge of constructing useful predictive models. All that is required is to define one or more dependent variables indicating in vivo responses to chemical exposures in one or more test species of interest (e.g., rodents); to link data on these responses for a database of chemicals to results of one or morein vitroassays (e.g., for genotoxicity, gene mutations, or chromosomal damage in bacteria and in mammalian cell cultures) and/or chemical structure features, to be used as predictors; and then to applyone’s favorite predictive analytic techniques to see how well they can predict in vivo responses from the selected predictors. Investigators with an interest in systems biology may use envisioned possible causal pathways and mechanisms or modes of action to guide or rationalize their selection of predictors, while others may prefer pure black-box statistical methods that simply seek the most predictive patterns, whether or not they conform to any biological theory or model. Over the decades, manypredictive analytics techniques have been tried, from clustering and regression modeling to expert systems (largely in the 1970s to 1990s) to artificial neural networks to current machine-learning methods such as Random Forest, ensembles of Bayesian Networks, or support vector machines. For any predictive technique, important questions of training and test set design, model validation, sensitivity and specificity of predictions, and generalizability of results arise. Yet, the intrinsic interest and importance of the topic and the comparative ease of addressing it by applying predictive analytics software has generated a large literature, replete with comparisons among methods for various training and test data sets.
Despite these many efforts, the results of decades of predictive modeling remain, at best, very mixed. For example, in the 1990s, several quantitative structure-activity relationship (QSAR) computer programs were developed to screen inventories of chemicals for mutagenicity and carcinogenicity. Some of the most commonly used systems (Deductive Estimation of Risk from Existing Knowledge (DEREK), Toxicity Prediction by Computer-Assisted Technology (TOPKAT), and Multiple Computer Automated Structure Evaluation (MCASE)) were promoted as being valuablefor predicting these endpoints (e.g., Patlewicz et al., 2003), and were used accordingly by regulators and companies to screen and prioritize chemicals for risk assessment. EPA continues to develop, endorse, and apply such models, claiming that they have “demonstrated reasonable predictive abilities” (EPA, 2011). However, when they were applied to a set of chemicals of great practical interest –a panel of 394 marketed pharmaceuticals– allturned out to have less than 52% sensitivity for predicting positive Ames assays, and even worse accuracy for predicting other genotoxic assay results (Snyder at al., 2004): “20% of the 124 non-carcinogens were positive in atleast one genotoxicity assay while two-thirds ofthe 77 rodentcarcinogens lacked activity in the genotoxicity tests employed” (Guyton et al., 2009).
Similarly, even the most successful systems in an experiment that tested how well the best predictive algorithms could predict rodent carcinogenicity for 30 chemicals had only about 60%-65% accuracy (Benigni and Zito, 2004). Valerio et al. (2007) reported 97% sensitivity but only 53% specificity for software used by the Food and Drug Administration (FDA) to predict rodent carcinogenicity of naturally occurring molecules found in human diets (i.e., few false negatives but many false positives), and Valerio et al. (2010)found that two such software programs “both exhibited poor performance in predicting [rodent] carcinogens” when evaluated in an external validation study of 43 phytochemicals. Walmsley and Billinton (2011) note that even the reported high sensitivity for some current in vitro test batteries for predicting rodent carcinogenicity (e.g., about 9 of 10 rodent carcinogens correctly classified as such based on combinations of bacterial and mammalian cell tests) (Kirkland et al., 2005) is less encouraging than it seems, insofar as the same test batteries also misclassify as many as 9 out of 10 non-carcinogens as being carcinogens. In other words, the predictive power of thein vitrotest batteries is not much better than would be achieved by simply assuming that all chemicals are rodent carcinogens, thus creating excellent sensitivity (no false negatives) but poor specificity (many false positives).The authors note that many potentially pharmaceutical compounds now classified as probable carcinogens based on genotoxicity results in bacterial and mammalian cells may not be carcinogens at all.
Finally, a long-standing literature notes that even a system that could accurately predict rodent carcinogenicity might have little value for predicting human carcinogenicity. For example, some have argued that EPA estimates far more chemicals as being carcinogenic in humans than do other authorities, such as the International Agency for Research on Cancer (IARC), due largely to over-reliance on animal data and to the limitation that “the true predictivity for human carcinogenicity of animal data is even poorer than is indicated by EPA figures alone” (Knight et al., 2006). Differences across species in gross anatomy (e.g., no Harderian, Zymbal, or preputial glands in humans), pharmacokinetics and pharmacodynamics, and species-specific modes of action (e.g., protein drop nephropathy in male rat kidneys) can all make rodent carcinogenicity of uncertain relevance to human carcinogenicity. This concern, though important, is beyond the scope of this chapter, which focuses on the narrower question of how well rodent carcinogenicity can be predicted from in vitro assay data and chemical structure (QSAR) information.
In light of this history of limited predictive performance, it is worth asking the following three methodological questions.
How good (i.e., how predictively accurate, e.g., as indicated by rates of false positives and false negatives in external validation test sets) are current rodent carcinogen prediction systems?
Are there fundamental limitations on the ability of computer algorithms to predict accurately (with high sensitivity and specificity) the carcinogenicity in rodents of a broad range of chemicals? For example, are there limits on the kinds of concepts or patterns that a computer algorithm can learn from data, and is there reason to believe that a concept such as “is a rodent carcinogen” can be learned accurately from data on chemicals and assay results? If not – if the desired classification rules are not learnable from examples, even with the help of guidance based on biological knowledge – then such a fundamental limitation cannot necessarily be removed by simply collecting more data, or by obtainingmore knowledge. Rather, it may simply be impossible to learn accurate predictive classification rules or procedures from available data. To our knowledge, potential fundamental limitations on the learnability of accurate prediction rules for classifying chemicals as rodent carcinogens, or for predicting cancer dose-response relations based on chemical and in vitro data, have not previously been carefully studied. Appropriate methods for addressing fundamental limitations of what is learnable are available in the machine-learning literature (e.g., using the Probably Approximately Correct learning frameworks and alternatives), but these methods have not as yet had much impact on QSAR or systems biology research used in carcinogenicity prediction systems.
What practical constraints limitpredictive performance, e.g., due to incomplete knowledge of relevant biological mechanisms, or limited sizes and diversity of training and test data sets? Unlike fundamental limitations, such practical limitations might be removed by further scientific research. This chapter and its appendix touch briefly on practical aspects of training set design, classification of chemicals with respect to rodent carcinogenicity, external test set design and validation, and extrapolation of risk predictions from tested to untested chemicals.
The following sections focus mainly on the first of these questions, criticallyevaluating the performance of a predictive scoring system and of the underlying HTS assays for rodent carcinogenicity of pesticides studied by Kleinstreuer et al., 2013. Questions 2 and 3 -- fundamental limitations to predictability and practical issues in developing and testing predictive scores or classifications – are also touched on as needed to understand the strengths and limitations of this predictive model.
For as long as there has been interest in developing algorithms to screen chemicals for likely in vivo activities based on relatively inexpensive QSAR and in vitro assay results, there have also been claims of encouragingly accurate performance of the systems in current use. However, as discussed in the preceding references, these optimistic appraisals have generally not been followed by equally good performance on external validation tests sets, when predictions must be made in advance of knowing the correct answers (e.g., Benigni and Zito, 2004). Claims of accurate prediction therefore deserve to be scrutinized. We do so using the recently published National Center for Computational Toxicology (NCCT) research article by (Kleinstreuer et al., 2013) as a case study. The article, promisingly entitled “In vitro perturbations of targets in cancer hallmark process predict rodent chemical carcinogenesis,” is an exciting piece of work. It applies contemporary knowledge of the biological “hallmarks of carcinogenesis” framework to inform selection and combination of high throughput screening (HTS) assay results that might indicate the activation of different causal pathways involved in causation of cancer. Kleinstreuer et al. conclude that "A simple scoring function... applied to an external test set of 33 compounds with carcinogenicity classifications from the EPA’s Office of Pesticide Programs… successfully (p= 0.024) differentiated between chemicals classified as “possible”/“probable”/“likely” carcinogens and those designated as “not likely” or with “evidence of noncarcinogenicity. This model represents a chemical carcinogenicity prioritization tool supporting targeted testing and functional validation of cancer pathways.” The following sections re-examine these encouraging findings in detail, and seeks to independently reproduce them. It concludes that, despite the promising and plausible research direction embodied in the proposed scoring approach, the claimed predictive accuracy appears to be an artifact of errors in classification of chemicals and in selection and use of statistical methods. It is not clear based on the data analyzed and associations reported between model predictions and chemical carcinogenicity classifications whether the system’s true performance is better than random guessing. However, a different analysis that examines how rodent carcinogenicity classification counts vary across chemicals with different predictive scores suggests that the simple scoring model does indeed have useful predictive power. Case Study: Reassessing the Accuracy and Robustness of a Rodent Carcinogenicity Prediction System Purpose, Scope, and Interpretation of the Original Study The following abstract from the article by Kleinstreuer et al. (2013)succinctly expresses the motivation, ambitions, rationale, and hoped-for results from important current efforts to use biological knowledge to help improve prediction of the in vivo rodent carcinogenicity of chemicals from relatively inexpensive high-throughput screening (HTS) in vitro assay results. (See Guyton et al., 2009, for a similar approach.)
“Thousands of untested chemicals in the environment require efficient characterization of carcinogenic potential in humans. A proposed solution is rapid testing of chemicals using in vitro high-throughput screening (HTS) assays for targets in pathways linked to disease processes to build models for priority setting and further testing. We describe a model for predicting rodent carcinogenicity based on HTS data from 292 chemicals tested in 672 assays mapping to 455 genes. All data come from the EPA ToxCast project. The model was trained on a subset of 232 chemicals with in vivo rodent carcinogenicity data in the Toxicity Reference Database (ToxRefDB). Individual HTS assays strongly associated with rodent cancers in ToxRefDB were linked to genes, pathways, and hallmark processes documented to be involved in tumor biology and cancer progression. …A simple scoring function was generated to identify chemicals with significant in vitro evidence that was predictive of in vivo carcinogenicity in different rat tissues and organs. This scoring function was applied to an external test set of 33 compounds with carcinogenicity classifications from the EPA’s Office of Pesticide Programs and successfully (p = 0.024) differentiated between chemicals classified as “possible”/“probable”/“likely” carcinogens and those designated as “not likely” or with “evidence of noncarcinogenicity.” This model represents a chemical carcinogenicity prioritization tool supporting targeted testing and functional validation of cancer pathways.” The chemicals of primary interest in this case study are food crop pesticides that are believed to operate through non-genotoxic mechanisms to induce one or more of the currently recognized hallmarks of carcinogenesis, i.e., sustained proliferative signaling, evasion of growth suppression signals, evasion of immune detection and of destruction of compromised cells, acquisition of replicative immortality, tumor-promoting inflammation, active invasion and metastasis, induction of neoangiogenesis, increased genome instability and mutation, evasion of apoptosis, and deregulation of cellular energetics (Hanahan and Weinberg, 2011).
Kleinstreuer at al.interpreted their work as testing the following hypothesis:
H1: “Chemicals that perturb certain cancer-linked targets or processes in human in vitro HTS assays will have a significantly higher likelihood of being carcinogens, as evidenced by carcinogenicity in the 2-year chronic assays in rodents.” The corresponding null hypothesis is:
H0: Chemicals identified as perturbing relevant pathways based on the in vitro HTS assay results are no more likely than other chemicals to exhibit carcinogenicity in the 2-year chronic assays in rodents. They concluded that their data support hypothesis H1 by allowing confident rejection of H0. They also propose a prioritization method that scores the possible carcinogenic potentials of chemicals by counting the number of cancer-associated endpoints that are identified as “significantly perturbed” in assay screening.
Original Data, and Replication Process and Results Kleinstreuer et al. (2013) began their analysis by searching forassays that predict rodent carcinogenicity based on finding significantly increased univariate odds ratios (ORs) for a chemical being classified as a rodent carcinogen if the assay is positive compared to if it is not. ORs were assessed in a training set of 292 chemicals for which bothin vitro assay results and in vivo 2-year rodent chronic assay results were available. These 292 chemicals were ToxCast Phase I chemicals for which 2-year chronic cancer bioassay data are available from the EPA Toxicity Reference Database (ToxRefDB, http://actor.epa.gov/toxrefdb/). This database classifies each chemical as positive or negativefor preneoplastic or neoplastic lesions in rats(232 chemicals) and mice (223 chemicals, 200 of which overlap with those for rats). Only the most common cancer endpoints were included for each species. The most common endpoints were defined as thosethat were positive for at least 20 chemicals. These endpoints were described as liver preneoplastic or neoplastic, lung preneoplastic, and spleen preneoplastic for mice; and as kidney preneoplastic, liver preneoplastic or neoplastic, testes preneoplastic or neoplastic, and thyroid preneoplastic or neoplastic for rats. The 292 chemicals used by Kleinstreuer et al. are listed in their Supplementary Table 1.
To independently replicate and validate this approach and quantify the predictivity of the HTS data for rodent carcinogenicity, we obtained data and software from the article publication site, the study authors, and the National Pesticide Information Center. Data files used in the study were obtained from the journal article publication site (http://toxsci.oxfordjournals.org/content/131/1/40/suppl/DC1). The files listed in the article as Supplementary Tables 1-6 mapped to the following file names:
Table 8.1. Data Files from Journal Website
File as named in Kleinstreuer et al. (2013)
File as named at website
Supplementary Table 1
chemical codes/names and indicator flag of use in study
Supplementary Table 2
table of hits by chemical and in-vivo endpoint
Supplementary Table 3
AC50 values by chemical and assay
Supplementary Table 4
Gene mappings and process counts by assay
Supplementary Table 5
Figure 3 diagram
Supplementary Table 6
research article database
We converted the pdf filesFile007, File008, File010 to data files via a combination of OCR software and manual correction.
Upon request, the corresponding author, Dr. Richard Judsonsentthe software used in the study and the full set of input and output tablesin electronic form.Table 2 lists several of these files whichproved key to validating the results.
Table 8.2. Additional Key Files Sent by NCCT on Request
R software used to generate odds ratios, identify significant variables, and compute predictive scores.
Odds ratios, LCIs, UCIs, and 2x2 contingency table values for individual assay-endpoint combinations
Odds ratios, LCIs, and UCIs for “permuted” chemicals, one set for each endpoint.
Cancer prediction scores for all 292 chemicals (similar to Table 1, but expanded to cover training chemicals)
For their training set, as described in the article, Kleinstreuer et al. (2013) drew on a database of 309 unique chemicals, 292 of which were flagged for usage. Of the 292 chemicals used, 60 had no rat-related in-vivo endpoint data (232 remaining), while 69 had no mouse-related in-vivo endpoint data (223 remaining). The database provided AC50 data values for 664 unique assays for each of the 309 chemicals. There were 673 assay names in the gene mapping file. (Six of the 664 assay columns in the AC50 data file had names that were not listed in the gene mapping file. All AC50 values for these 6 were zero, however, as was also true of many other assays. This did not impact the results.)
For their test data set,Kleinstreuer et al. (2013)first identified 60 chemicals, listed in their Table 1 (“Summary of cancer hazard model for chemicals not included in the training set for rat endpoints”) that had not been used in constructing their risk prediction model, but that had in vitro assay results. From these, they selected as their final external validation test set a subset of 33 chemicals that had EPA Office of Pesticide Programs (OPP) human carcinogenicity classifications (shown in the last column of their Table 1). They note that “these ‘human’ classifications are in reality a summary of data from [largely] rodent studies and so are comparable with the data used in developing the model.”For purposes of testing and validating model predictions, Kleinstreuer et al. assigned any of the 60 test set chemicals with a classification containing the words “Likely”, “Probable”, or “Possible” for OPP’s assessment of carcinogenic potential a value of 1, and assigned those containing “Not Likely” or “Evidence of noncarcinogenicity” a value of 0. They excluded all other chemicals, e.g., those with classifications such as “Not classifiable” or “Suggestive evidence”, or “Insufficient data.”
The data provided to us by NCCT did not include an electronic version of their Table 1, the external validation test data set.We therefore independently obtained a pdf documentfrom the National Pesticide Information Center athttp://npic.orst.edu/chemicals_evaluated.pdf, “Chemicals Evaluated for Carcinogenic Potential” (Nov 2012), published by the EPA Office of Pesticide Programs (OPP),that contains the cancer classification for most of the chemicals used in this study. We converted the pdf file to data via OCR software to support automated computations.(The URL in the article for the 2010 version of this document links to a website that states that the paper file can only be obtained by phone or mail.) We then manually checked each entry in Table 1 of Kleinstreuer et al. (2013) to verify the accuracy of the provided classifications of carcinogenic potency.
Appendix A presents details of our chemical-by-chemical review and replication effort. Although we were able to independently confirm most (58 out of 60) of the carcinogenicity classifications in Kleinstreuer et al.’s Table 1 as matching those provided by OPP, we found discrepancies for 2 out of 60 chemicals. For these chemicals, the classifications reported by Kleinstreuer at al. and attributed to OPP differ from those in the EPA/OPP published data that we retrieved. These two cases are asfollows.
Methylene bis(thiocyanate) [CASRN 6317-18-6] (MITC) is shown in Table 1 of Kleinstreuer et al. with an EPAcarcinogenic potential classification of “Likely to be carcinogenic – based on metam sodium data”. However the EPA/OPP document states: “There are insufficient data to characterize the cancer risk of MITC,” with a report dateof 2009. The computed cancer hazard score for that MITC is relatively high, at 16.It is unclear whether the stated rationale, “…based on metam sodium data” was inserted by Kleinstreuer et al.Although MITC is a breakdown product of metam sodium, it is not formed quickly and the use of the metam sodium risk information as a surrogate for it is scientifically questionable (www.epa.gov/oppsrrd1/REDs/3082red.pdf). Indeed, separate EPA documentation explicitly states that “it is not appropriate to quantify MITC cancer potential using the metam sodium cancer slope factor….” www.epa.gov/pesticides/chem_search/cleared_reviews/csr_PC-068103_13-May-04_a.pdf. In any case, the classification of MITC in Table 1 of Kleinstreuer et al. does not match that provided by OPP.
Etridiazole[also called Terrazole, CASRN 2593-15-9] is shown in Table 1 of the article as having “No data.” However, the EPA/OPP document described above classifies it as “Group B – Probable Human Carcinogen”, with a report date of 1999. In more detail, "Etridiazole was classified by the Agency's Health Effects Division Cancer Peer Review Committee (CPRC) as a Probable Human Carcinogen. This classification is based on the following factors: (i) occurrence of multiple tumor types in male and female rats (tumor sites noted were the liver, bile duct, mammary gland, thyroid, and testes) including the induction of a rare bile duct tumor (cholangiocarcinoma), and (ii) non-neoplastic lesions observed in similar target organs that lend support to the association of etridiazole exposure with the induction of tumors; increased absolute and relative liver weight (males), hepatocytomegaly (males); clear, basophilic, and eosinophilic cellular alterations (males and females); cholangiectasis (females); centrilobular pigmentation (females); spongiosishepatis of the liver (males); and testicular interstitial cell hyperplasia (males) and (iii) positive mutagenicity data. The carcinogenicity study in mice was determined to be unacceptable and not adequate for assessment of the carcinogenic potential of etridiazole in this species."http://www.epa.gov/oppsrrd1/REDs/0009red.pdfThe cancer hazard score for this chemical computed by Kleinstreuer et al.’s model was zero, so the fact that it was misclassified as having “No data” prevented an important discrepancy between this prediction and the multiple tumor types observed in rats from being taken into account.
We obtained matching classifications for all of the remaining 58 chemicals in Table 1 of Kleinstreuer et al. (or had a similar conclusion of “no data” where that was shown). However, the discrepancies for MITC and etridiazolehave a significant impact on the study results, as discussed later.
Appendix A to this chapter provides additional details of our findings for all 60 chemicals, including several cases in which current knowledge might lead to changes in the EPA-OPP classifications used by Kleinstreuer et al. However, as our main goal is simply to verify whether the claimed accuracy can be confirmed using the same data and methods as the original authors as far as possible, we do not attempt to update or correct their Table 1 except for MITC and etridiazole, where it appears that the stated methodology was not followed correctly.
Original Methods, and Replication Process and Results In addition to independently reproducing the data in the training data set and external validation data set (with the exceptions just discussed), we also sought to reproduce the methodology used by Kleinstreuer et al. (2013) as far as possible, based on the documentation provided and the additional data files and software obtained from the authors. This section describes how we replicated these methods.