The data to be analyzed are from two factories in Tianjin, here called Factories 1 and 2, in which workers were occupationally exposed to benzene, and from a third factory, Factory 3, in which they were not. The data were obtained by e-mailing requests to the lead authors (Rappaport and Price) of the papers and letters just discussed; no response was received from Rappaport, but Price promptly provided a full copy of the data set, which is also available as supplemental information in Price et al. (2012). To our knowledge, the validity of these data are not disputed, and the same data have been used by both teams.
Table 4.1 shows the layout of the raw data, excluding some columns dealing with identifying individual workers and dates. Each row represents measurements for a single worker. Several workers have multiple rows, as measurements were taken for them on several different days. The variables and their units is as follows: Factory = ID of factory (1 and 2 used benzene, 3 did not); Weight = worker’s weight in kg; Height = worker’s height in cm; Gender = 0 for women, 1 for men; Subject = worker’s ID; UB = Urinary benzene (nM); UT = Urinary toluene (nM); SPMA = Urinary SPMA (M); PH = Urinary phenol (M); MA = Urinary muconic acid (M); CA = Urinary catechol (M); HQ = Urinary hydroquinone (M); AB = Benzene in air (ppm); AT = Toluene in air (ppm); Creat = Urinary creatinine (mM); Rep and Split indicate multiple samples on different dates from the same individual; Samdate = Date of air/urine samples; BTdate = Date of analysis of UB; PCHMdate = Date of analysis of PH, MA, CA, HQ; Sdate = Date of analysis of SPMA. The full data set can be downloaded from http://cox-associates.com/CAT.htm; it is data “Tianjin” in the software at that web site.