In this chapter we elucidate four main themes. The rst is that modern data analyses, including "Big Data" analyses, often rely on data from dierent sources, which can present challenges in constructing statistical models that can make eective use of all of the data. The second theme is that although data analysis is usually centralized, frequently the nal outcome is to provide information or allow decision-making for individuals. Third, data analyses often have multiple uses by design: the outcomes of the analysis are intended to be used by more than one person or group, for more than one purpose. Finally, issues of privacy and condentiality can cause problems in more subtle ways than are usually considered; we will illustrate this point by discussing a case in which there is substantial and eective political opposition to simply acknowledging the geographic distribution of a health hazard.

A researcher analyzes some data and learns something important. What happens next? What does it take for the results to make a dierence in people's lives? In this chapter we tell a story - a true story - about a statistical analysis that should have changed government policy, but didn't. The project was a research success that did not make its way into policy, and we think it

provides some useful insights into the interplay between locally-collected data, statistical analysis, and individual decision making.

7.1

%& Chapter %0 Journal Article %J Health Physics %D 1995 %T Bayesian Prediction of Mean Indoor Radon Concentrations for Minnesota Counties %A Phillip N. Price %A Anthony V. Nero %A Andrew Gelman %XPast efforts to identify areas with higher than average indoor radon concentrations by examining the statistical relationship between local mean concentrations and physical parameters such as the soil radium concentration have been hampered by the variation in local means caused by the small number of homes monitored in most areas. In this paper, indoor radon data from a survey in Minnesota are analyzed to minimize the effect of finite sample size within counties, to determine the true county-to-county variation of indoor radon concentrations in the state, and to find the extent to which this variation is explained by the variation in surficial radium concentration among counties. The analysis uses hierarchical modeling, in which some parameters of interest (such as county geometric mean radon concentrations) are assumed to be drawn from a single population, for which the distributional parameters are estimated from the data. Extensions of this technique, known as random effects regression and mixed effects regression, are used to determine the relationship between predictive variables and indoor radon concentrations; the results are used to refine the predictions of each county's radon levels, resulting in a great decrease in uncertainty. The true county-to-county variation of geometric mean radon levels is found to be substantially less than the county-to-county variation of the observed geometric means, much of which is due to the small sample size in each county. The variation in the logarithm of surficial radium content is shown to explain approximately 80% of the variation of the logarithm of geometric mean radon concentration among counties. The influences of housing and measurement factors, such as whether the monitored home has a basement and whether the measurement was made in a basement, are also discussed. The statistical method can be used to predict mean radon concentrations, or applied to other geographically distributed environmental parameters.

%B Health Physics %8 12/1996 %G eng %12.4

%2 LBNL-35818Rev %& Chapter