Cossalter M, Mengshoel O, Selker T. (2011) Visualizing and understanding large-scale Bayesian networks. Proceedings of the 17th AAAI Conference on Scalable Integration of Analytics and Visualization. 12-21. AAAI Press
Cover TM, Thomas JA. (2006) Elements of Information Theory 2nd Ed. John Wiley & Sons. Hoboken, NJ. ISBN-13 978-0-471-24195-9ISBN-10 0-471-24195-4. https://archive.org/details/ElementsOfInformationTheory2ndEd (Last accessed 1-9-2018)
Cox LAT Jr. Do causal concentration-response functions exist? A critical review of associational and causal relations between fine particulate matter and mortality. Crit Rev Toxicol. 2017 Aug;47(7):603-631. doi: 10.1080/10408444.2017.1311838
Cox LA Jr. (2017b) Socioeconomic and air pollution correlates of adult asthma, heart attack, and stroke risks in the United States, 2010-2013. Environ Res. 2017 May;155:92-107. doi: 10.1016/j.envres.2017.01.003.
Crowley M (2004). Evaluating influence diagrams. www.cs.ubc.ca/~crowley/papers/aiproj.pdf
Di Q, Y. Wang, Zanobetti A, Y. Wang, Koutrakis P, Dominici F, Schwartz JD. (2017) Association of short-term exposure to air pollution with mortality in older adults. Journal of the American Medical Association [Internet]. 318 (24) :2446-2456. Ding P. A Paradox from Randomization-Based Causal Inference. Statist. Sci. Volume 32, Number 3 (2017), 331-345. https://arxiv.org/pdf/1402.0142.pdf
Dominici F, Zigler C. (2017) Best practices for gauging evidence of causality in air pollution epidemiology. American Journal of Epidemiology.
Druzdzel MJ, Simon H. (1993) Causality in Bayesian Belief Networks. UAI '93 Proceedings of the Ninth international conference on Uncertainty in artificial intelligence. Pages 3-11 Washihgton, DC. July 09 - 11, 1993 Morgan Kaufmann Publishers Inc. San Francisco, CA, USA.
Dugan J.B. (2000) Galileo: A Tool for Dynamic Fault Tree Analysis. In: Haverkort B.R., Bohnenkamp H.C., Smith C.U. (eds) Computer Performance Evaluation. Modelling Techniques and Tools. TOOLS 2000. Lecture Notes in Computer Science, vol 1786. Springer, Berlin, Heidelberg Funk MJ, Westreich D, Wiesen C, Stürmer T, Brookhart MA, Davidian M. Doubly robust estimation of causal effects. Am J Epidemiol. 2011 Apr 1;173(7):761-7. doi: 10.1093/aje/kwq439.
Gharamani Z. (2001). An introduction to Hidden Markov models and Bayesian networks. International Journal of Pattern Recognition and Artificial Intelligence 15(1): 9-42. http://mlg.eng.cam.ac.uk/zoubin/papers/ijprai.pdf
Glass TA, Goodman SN, Hernán MA, Samet JM. Causal inference in public health.Annu Rev Public Health. 2013;34:61-75. doi: 10.1146/annurev-publhealth-031811-124606.
Granger, C. W. J. (1969). Investigating causal relations by econometric models and cross-spectral methods. Econometrica. 37 (3): 424-438.
Hausman DM, Woodward J. (2004). Modularity and the Causal Markov Condition: A restatement The British Journal for the Philosophy of Science. 55(1):147-161. https://doi.org/10.1093/bjps/55.1.147
Hausman DM, Woodward J. (1999). Independence, Invariance, and the Causal Markov Condition. The British Journal for the Philosophy of Science. 50 (4): 521 583. doi:10.1093/bjps/50.4.521
Heinze-Deml C, Peters J, Meinshausen N. 2017. Invariant causal prediction for nonlinear models. https://arxiv.org/pdf/1706.08576.pdf
Hernan M, Vanderweele T. On compound treatments and transportability of causal inference.Epidemiology. 2011;22:368. Hill AB. The environment and disease: association or causation? Proc R Soc Med. 1965 May;58:295-300
Hill J. 2016 Atlantic Causal Inference Conference Competition: Is Your SATT Where It's At? http://jenniferhill7.wixsite.com/acic-2016/competition
Höfler M. The Bradford Hill considerations on causality: a counterfactual perspective.Emerg Themes Epidemiol. 2005 Nov 3;2:11.
Holt J, Leach AW, Johnson S, Tu DM, Nhu DT, Anh NT, Quinlan MM, Whittle PJL, Mengersen K, Mumford JD. Bayesian Networks to Compare Pest Control Interventions on Commodities Along Agricultural Production Chains. Risk Anal. 2017 Jul 13. doi: 10.1111/risa.12852.
Hoover KD. Causal structure and hierarchies of models. Studies in History and Philosophy of Biological and Biomedical Sciences (2012), http://dx.doi.org/10.1016/j.shpsc.2012.05.007
IARC (2006). IARC Monographs on the Evaluation of Carcinogenic Risk to Humans: Preamble. International Agency for Research on Cancer (IARC). Lyons, France. http://monographs.iarc.fr/ENG/Preamble/CurrentPreamble.pdf
Imai K, Keele L, Tingley D, Yamamoto T. Unpacking the black box of causality: learning about causal mechanisms from experimental and observational studies. American Political Science Review Vol. 105, No. 4 November 2011
Iserman R, Münchhof M. (2011) Identification of Dynamic Systems: An Introduction with Applications. Springer. New York, New York. Jonsson A, Barto B. (2007) Active learning of dynamic Bayesian networks in Markov decision processes. SARA'07 Proceedings of the 7th International conference on Abstraction, reformulation, and approximation. Pages 273-284. Whistler, Canada. July 18 - 21, 2007. Springer-Verlag Berlin
Kahneman D. Thinking fast and slow. New York: Farrar, Straus, and Giroux; 2011.
Khakzad N, Reniers G. Risk-based design of process plants with regard to domino effects and land use planning. J Hazard Mater. 2015 Dec 15;299:289-97. doi: 10.1016/j.jhazmat.2015.06.020.
Keele L, Tingley D, Yamamoto T. (2015) Identifying mechanisms behind policy interventions via causal mediation analysis. Journal of Policy Analysis and Management, Vol. 34, No. 4, 937–963
Khakzad N, Khan F, Amyotte P. (2013) Dynamic safety analysis of process systems by mapping bow-tie into Bayesian network. Process Safety and Environmental Protection. 91(1-2): 46-53
Kleinberg S, Hripcsak G. (2011) A review of causal inference for biomedical informatics. J Biomed Inform. Dec;44(6):1102-12.
Koller D, Friedman N (2009). Probabilistic Graphical Models: Principles and Techniques. MIT Press, Cambridge, MA Kuhn M. Building Predictive Models in R Using the caret Package (2008). Journal of Statistical Software. Nov. 22 28(5): 1-26. https://www.jstatsoft.org/article/view/v028i05/v28i05.pdf Lähdesmäki H, Hautaniemi S, Shmulevich I, Yli-Hari O. (2006) Relationships between probabilistic Boolean networks and dynamic Bayesian networks as models of gene regulatory networks. Signal Processing. 2006 Apr; 86(4): 814–834. doi: 10.1016/j.sigpro.2005.06.008
Lagani V, Triantafillou S, Ball G, Tegnér J, Tsamardinos I. (2016) Probabilistic Computational Causal Discovery for Systems Biology. Chapter 2 in L. Geris and D. Gomez-Cabrero (Eds.), Uncertainty in Biology: A Computational Modeling Approach. Springer International Publishing.
Lancet (2017). www.thelancet.com/pb-assets/Lancet/stories/commissions/pollution-2017/Pollution_and_Health_Infographic.pdf
Lee S, Honavar V. (2013) m-Transportability: Transportability of a causal effect from multiple environments. Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence. www.aaai.org/ocs/index.php/AAAI/AAAI13/paper/viewFile/6303/7210
(Bareinboim and Pearl, 2013; Lee and Honavar, 2013).
Leu SS, Chang CM. Bayesian-network-based safety risk assessment for steel construction projects. Accid Anal Prev. 2013 May;54:122-33. doi: 10.1016/j.aap.2013.02.019.
Li J, Ma S, Le T, Liu L, Liu J. (2017) Causal decision trees. IEEE Transactions on Knowledge and Data Engineering. Feb 1. 29(2): 257-271
Lok JJ. (2017) Mimicking counterfactual outcomes to estimate causal effects.Ann Stat. Apr;45(2):461-499. doi: 10.1214/15-AOS1433.
Machado D, Costa RS, Rocha M, Ferreira EC, Tidor B, Rocha I. (2011) Modeling formalisms in Systems Biology. AMB Express. Dec 5;1:45. doi: 10.1186/2191-0855-1-45.
Maglogiannis I, Zafiropoulos E, Platis A, Lambrinoudakis C. Risk analysis of a patient monitoring system using Bayesian Network modeling. J Biomed Inform. 2006 Dec;39(6):637-47.
Maldonado G. (2013) Toward a clearer understanding of causal concepts in epidemiology. Ann Epidemiol. Dec;23(12):743-9.
Mauá DD. (2016) Equivalences between maximum a posteriori inference in Bayesian networks and maximum expected utility computation in influence diagrams. International Journal of Approximate Reasoning. Jan 68(C): 211-229 Mengshoel OJ, Chavira M, Cascio K, Poll S, Darwiche A, Uckun S. (2010). Probabilistic Model-Based Diagnosis: An Electrical Power System Case Study. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans 40(5):874–885.
Menzies P. The causal structure of mechanisms. Stud Hist Philos Biol Biomed Sci. 2012 Dec;43(4):796-805. doi: 10.1016/j.shpsc.2012.05.00
Murray CJ, Lopez AD. Measuring the global burden of disease. N Engl J Med. 2013 Aug 1;369(5):448-57. doi: 10.1056/NEJMra1201534.
Nadkarni S, Shenoy PP. (2004). A causal mapping approach to constructing Bayesian networks. Decision Support Systems Nov 38(2): 259-281. DOI=http://dx.doi.org/10.1016/S0167-9236(03)00095-2 National Research Council. 2012. Deterrence and the Death Penalty. Washington, DC: The National Academies Press. https://doi.org/10.17226/13363.
Ogarrio JM, Spirtes P, Ramsey J. (2016) A hybrid causal search algorithm for latent variable models. JMLR Workshop Conf Proc. Aug; 52: 368–379. www.ncbi.nlm.nih.gov/pmc/articles/PMC5325717/
Neyman J. Sur les applications de la theorie des probabilites aux experiences agricoles: Essai des principes. Master's Thesis (1923). Excerpts reprinted in English, Statistical Science, Vol. 5, pp. 463–472. (D. M. Dabrowska, and T. P. Speed, Translators.)
Omenn GS, Goodman GE, Thornquist MD, Balmes J, Cullen MR, Glass A, Keogh JP, Meyskens FL, Valanis B, Williams JH, Barnhart S, Hammar S. (1996) Effects of a combination of beta carotene and vitamin A on lung cancer and cardiovascular disease. N Engl J Med. May 2;334(18):1150-5.
Pang M, Schuster T, Filion KB, Schnitzer ME, Eberg M, Platt RW. Effect Estimation in Point-Exposure Studies with Binary Outcomes and High-Dimensional Covariate Data - A Comparison of Targeted Maximum Likelihood Estimation and Inverse Probability of Treatment Weighting.
Int J Biostat. 2016 Nov 1;12(2). pii: /j/ijb.2016.12.issue-2/ijb-2015-0034/ijb-2015-0034.xml.
Papana A, Kyrtsou C, Kugiumtzis D, Diks C. (2017) Assessment of resampling methods for causality testing: A note on the US inflation behavior. PLoS One. Jul 14;12(7): e0180852. doi: 10.1371/journal.pone.0180852.
Pearl J. (2001) Direct and Indirect Effects. In Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence, San Francisco, CA: Morgan Kaufmann, 411-420.
Pearl J. (2014) Reply to commentary by Imai, Keele, Tingley, and Yamamo to concerning causal mediation analysis. Psychol Methods. Dec; 19(4):488-92.
Petersen ML, Sinisi SE, van der Laan MJ. (2006) Estimation of direct causal effects. Epidemiology. May; 17(3):276-84. Petitti, DB (1991). Associations Are Not Effects. American Journal of Epidemiology. Jan133(2): 101-102 https://academic.oup.com/aje/article-abstract/133/2/101/118425/Associations-Are-Not-Effects?redirectedFrom=PDF Peyrard N, Givry S, Franc A, Robin S, Sabbadin R, Schiex T, Vignes M. (2015) Exact and approximate inference in graphical models: Variable elimination and beyond. https://arxiv.org/pdf/1506.08544.pdf
Poole DL, Mackworth AK. (2017) Artificial Intelligence: Foundations of Computational Agents. 2nd Edition. Cambridge University Press. http://artint.info/2e/html/ArtInt2e.html
Prüss-Üstün A, Mathers C, Corvalán C, Woodward A. (2003). Introduction and methods: Assessing the environmental burden of disease at national and local levels. Environmental burden of disease series No. 1. World Health Organization (WHO). Geneva, Switzerland. www.who.int/quantifying_ehimpacts/publications/en/9241546204chap4.pdf?ua=1
Relton C, Torgerson D, O'Cathain A, Nicholl J. Rethinking pragmatic randomised controlled trials: introducing the "cohort multiple randomised controlled trial" design.BMJ. 2010 Mar 19;340:c1066. doi: 10.1136/bmj.c1066. http://www.bmj.com/content/340/bmj.c1066
Rhomberg LR, Chandalia JK, Long CM, Goodman JE. (2011) Measurement error in environmental epidemiology and the shape of exposure-response curves. Crit Rev Toxicol. Sep;41(8):651-71. doi: 10.3109/10408444.2011.563420.
Richardson TS, Rotnitzky A. Causal Etiology of the Research of James M. Robins. Statistical Science 2014, Vol. 29, No. 4, 459–484 DOI: 10.1214/14-STS505
Rigaux C, Ancelet S, Carlin F, Nguyen-thé C, Albert I. Inferring an augmented Bayesian network to confront a complex quantitative microbial risk assessment model with durability studies: application to Bacillus cereus on a courgette purée production chain. Risk Anal. 2013 May;33(5):877-92. doi: 10.1111/j.1539-6924.2012.01888.x.
Robins JM, Greenland S. Identifiability and exchangeability for direct and indirect effects. Epidemiology 1992, 3:143-155.
Rothenhausler D, Heinze C, Peters J, Meinschausen N. (2015) BACKSHIFT: Learning causal cyclic graphs from unknown shift interventions. arXiv pre-print https://arxiv.org/pdf/1506.02494.pdf. See also the BACKSHIFT R package at https://cran.r-project.org/web/packages/backShift/backShift.pdf. Rubin, Donald (1974) Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66 (5): 688–701.
Sanchez-Graillet O, Poesio M. (2004) Acquiring Bayesian Networks from Text. Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC'04). 26-28 May. Lisbon, Portugal. European Language Resources Association (ELRA). Paris, France. www.lrec-conf.org/proceedings/lrec2004/
Savageau M, Voit E. Recasting nonlinear differential equations as S-systems: a canonical nonlinear form. Math Biosci. 1987;87(1):83–115.
Schreiber, Thomas (1 July 2000).Measuring Information Transfer.Physical Review Letters.85(2): 461–464.doi:10.1103/PhysRevLett.85.461
Shachter RD. (1986) Evaluating influence diagrams. Operations Research Nov-Dec 34(6): 871-882.
Shachter RD, Bhattacharjya D. 2010. Solving influence diagrams: Exact algorithms. In Cochran J. et al. (Eds.) Wiley Encyclopedia of Operations Research and Management Science. John Wiley & Sons. New York. www.it.uu.se/edu/course/homepage/aism/st11/Shachter10.pdf
Schwartz S, Gatto NM, Campbell UB. (2011) Transportabilty and causal generalization. Epidemiology: Sep 22(5): 745-6
Shpitser I, Pearl J. Complete identification methods for the causal hierarchy. Journal of Machine Learning Research, 9(Sep):1941–1979, 2008. Simon HA. (1953) Causal ordering and identifiability, in: W.C. Hood, T.C. Koopmans (Eds.), Studies in Econometric Method, in: Cowles Commission for Research in Economics Monograph No. 14, John Wiley & Sons, Inc., New York, NY, pp. 49–74, Chapter III. Simon HA. Spurious correlation: A causal interpretation. Journal of the American Statistical Association 49 (267) September 1954: 467–479. Simon HA, Iwasaki Y. Causal ordering, comparative statics, and near decomposability. Journal of Econometrics 39 (1988) 149-173. http://digitalcollections.library.cmu.edu/awweb/awarchive?type=file&item=34081
Suppes P. (1970) A Probabilistic Theory of Causality. North-Holland Publishing Company. Amsterdam, Holland.
Tashiro T, Shimizu S, Hyvärinen A, Washio T. (2014) ParceLiNGAM: a causal ordering method robust against latent confounders. Neural Comput. 2014 Jan;26(1):57-83. doi: 10.1162/NECO_a_00533.
Textor J, van der Zander B, Gilthorpe MS, Liskiewicz M, Ellison GT. Robust causal inference using directed acyclic graphs: the R package 'dagitty'. Int J Epidemiol. 2016 Dec 1;45(6):1887-1894.
Theocharous G, Murphy K, Kaelbling LP. Representing hierarchical POMDPs as DBNs for multi-scale robot localization. IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. Triantafillou S, Tsamardinos I. Constraint-based causal discovery from multiple interventions over overlapping variable sets. Journal of Machine Learning Research 16 (2015) 2147-2205
Trovati M. (2015) Extraction of Bayesian Networks from Large Unstructured Datasets. In: Trovati M, Hill R, Anjum A, Zhu S, Liu L (eds) Big-Data Analytics and Cloud Computing. Springer, Cham Tudor RS, Hovorka R, Cavan DA, Meeking D, Hejlesen OK, Andreassen S. (1998) DIAS-NIDDM--a model-based decision support system for insulin dose adjustment in insulin-treated subjects with NIDDM.Comput Methods Programs Biomed. May;56(2):175-91.
VanderWeele TJ and Vansteelandt S. Conceptual issues concerning mediation, interventions and composition. Statistics and its Interface 2009, 2:457-468.
Voortman M, Dash D, Druzdzel MJ. (2010) Learning causal models that make correct manipulation predictions with time series data. Proceedings of Machine Learning Research 6:257–266 http://proceedings.mlr.press/v6/voortman10a/voortman10a.pdf
Vrignat P, Avila M, Duculty F, Kratz F. (2015) Failure event prediction using Hidden Markov Model approaches. IEEE Transactions on Reliability, Institute of Electrical and Electronics Engineers 99:1-11. Westreich D. Berkson's bias, selection bias, and missing data.Epidemiology. 2012 Jan;23(1):159-64. doi: 10.1097/EDE.0b013e31823b6296.
Wintle BC, Nicholson A. Exploring risk judgments in a trade dispute using Bayesian networks. Risk Anal. 2014 Jun;34(6):1095-111. doi: 10.1111/risa.12172.
Wickham H. (2014) Tidy data. Journal of Statistical Software. Aug 59(10): 1-23 Wiener N. (1956) The theory of prediction. In Modern Mathematics for Engineers, vol. 1 (ed. E. F. Beckenbach). New York: McGraw-Hill. Wright, S. (1921). Correlation and causation.J. Agricultural Research.20: 557–585. www.ssc.wisc.edu/soc/class/soc952/Wright/Wright_Correlation%20and%20Causation.pdf Zhang J. (2008) On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias. Artificial Intelligence 172 (16-17): 1873-1896
Zhang L, Wu X, Qin Y, Skibniewski MJ, Liu W. Towards a Fuzzy Bayesian Network Based Approach for Safety Risk Analysis of Tunnel-Induced Pipeline Damage. Risk Anal. 2016 Feb;36(2):278-301. doi: 10.1111/risa.12448.
Descriptive Analytics in Public and Occupational Health
Descriptive Analytics for Public Health: Socioeconomic and Air Pollution Correlates of Adult Asthma, Heart Attack, and Stroke Risks Introduction This is the first of four chapters emphasizing the application of descriptive analytics to characterize public and occupational health risks. Much of risk analysis addresses basic descriptive information: how big is a risk now, how is it changing over time or with age, how does it differ for people or situations with different characteristics, on what factors does it depend, with what other risks or characteristics does it cluster? Such questions arise not only for public and occupational health and safety risks, but also for risks of failures or degraded performance in engineering infrastructure or technological systems, financial systems, political systems, or other “systems of systems” (Guo and Haimes, 2016). Simply knowing how large a risk is now and whether it is increasing, staying steady, or decreasing may be enough to decide whether a proposed costly intervention to reduce it is worth considering further. This chapter shows how to use basic tools of descriptive analytics, especially interaction plots (showing the conditional expected value of one variable at different levels of one or more other variables), together with more advanced methods from Chapter 2, such as regression trees, partial dependence plots, Bayesian networks (BNs), to describe risks and how they vary with other factors. A brief discussion and motivation of these methods is given for readers who have skipped Chapter 2. Chapter 4 introduces additional descriptive techniques, including plots that use non-parametric regression to pass smooth curves or surfaces through data clouds. It shows how they can be used, together with simple mathematical analysis, to resolve a puzzle that has occasioned some debate among toxicologists: that some studies have concluded that workers form disproportionately high levels of benzene metabolites at very low occupational exposure concentrations compared to higher concentrations, while other studies conclude that metabolism of benzene at low concentrations is approximately linear, and proportional to concentrations in inhaled air. Chapter 5 emphasizes the value of descriptive plots, upper-bounding analyses, and qualitative assumptions, as well as more quantitative risk assessment modeling, in bounding the size of human health risks from use of antibiotics in food animals. Chapter 6 calculates plausible bounds on the sizes of the quantitative risks to human health of infection with a drug-resistant “super-bug” from swine farming operations. Together, these chapters illustrate how descriptive analytics can be used to obtain and present useful quantitative characterizations of human health risks despite realistic scientific uncertainties about the details of relevant causal processes.
Asthma in the United States is an important public health issue. Many physicians, regulators, and scientists have expressed concern that exposures to criterion air pollutants have contributed to a rising tide of asthma cases and symptoms. The following sections describe associations between self-reported asthma experiences and various socioeconomic factors in survey data, as well as pollution data from other sources. Interaction plots are used to investigate and visualize statistical associations among variables. We then apply Bayesian network learning algorithms and other non-parametric machine-learning algorithms to further describe these statistical dependencies and to clarify possible causal interpretations. Associations with self-reported heart attack and stroke experience confirm that well-established relations between smoking and heart attack or stroke risks are seen in this data set (Shah and Cole, 2010; Oliveira et al., 2007).
Readers with limited interest in asthma, stroke, and heart attack risks may skim the rest of this chapter without impairing understanding of subsequent chapters. However, we recommend looking at the figures, as they illustrate the use of interaction plots and other diagrams to show how risks cluster and how they vary with other factors. A brief summary of the empirical findings is that self-reported heart attack and stroke experience are positively associated with each other and with self-reported asthma risks. Intriguingly, young divorced women with low incomes are at greatest risk of asthma, especially if they are ever-smokers. Income is an important confounder of other relations. (For example, in logistic regression modeling, PM2.5 is positively associated (p < 0.06) with both stroke risk and heart attack risk when these are regressed only against PM2.5, sex, age, and ever-smoking status, but not when they are regressed against these variables and income.) In this data set, PM2.5 is significantly negatively associated with asthma risk in regression models, with a10 g/m3 decrease in PM2.5 corresponding to about a 6% increase in the probability of asthma, possibly because of confounding by smoking, which is negatively associated with PM2.5 and positively associated with asthma risk. A variety of non-parametric methods are used to quantify these associations and to explore potential causal interpretations.
To investigate the association between air pollutants (O3 and PM2.5) and self-reported adult asthma, stroke, and heart attack risks, we merged the following data sources: (a) The most recent 5 years of available survey response data from a survey of over 228,000 individuals from 15 states, retrieved from the Center for Disease Control and Prevention (CDC) Behavioral Risk Factor Surveillance (BRFSS) System (www.cdc.gov/brfss/questionnaires/state2013.htm); and (b) Environmental Protection Agency (EPA) data on O3 and PM2.5 concentrations for the counties in which these individuals lived at the time of the survey, retrieved from the US EPA web site (www.epa.gov/airtrends/pm.html). Counties were used as the common key for merging annual average air pollution levels with individual response data. Table 3.1 summarizes the number of individual responses from each state for each of several questions. These responses are coded so that a response of “Yes” has a value of 1 and a value of “No” has a value of zero. Other responses, or non-responses, are coded as missing data. Thus, for example, 38% of the 8618 respondents from Arizona were male (giving a mean value of 0.38 to the variable “Sex = Male” (henceforth abbreviated as “Sex”) with values of 1 for men and 0 for women). As suggested by this example, the respondents in the BRFSS do not constitute a simple random sample of the population. The BRFSS survey supplies county weights for reweighting responses to better reflect the entire population. However, this chapter does not seek to extrapolate relations outside the surveyed population, but focuses on quantifying conditional relations within this sample, e.g., studying how probability of asthma varies by age and sex and other variables, without considering how to adjust for differences between the joint frequency distribution of these variables in the survey population and in the more general population.