DIMACS Workshop on Ecologic Inference

November 28 - 30, 2007
DIMACS Center, CoRE Building, Rutgers University

Organizer:: Tom Webster, Boston University, twebster@bu.edu

Presented under the auspices of the Special Focus on Computational and Mathematical Epidemiology.

Abstracts:

Rob Eisinga, Radboud University Nijmegen (Netherlands)

Title: Fisher information in complete and incomplete 2 x 2 tables

We discuss the key inferential issue of Fisher information loss in ecologic 2 x 2 tables compared to complete individual-level tables. We obtain expressions for the expected information about the parameters and their covariance in the ecologic data setting. These expressions well reflect the additional uncertainty arising from the unobserved individual-level data in ecologic 2 x 2 tables. We also discuss methods to combine ecologic and individual-level 2 x 2 tables. One is simply to pool the two types of tables and to analyze the combined data using either maximum likelihood or Bayesian analysis with uninformative priors. Another one is based on the idea that the individual-level data provide auxiliary information about the behavior of the parameters in the ecologic data context. That is, we first analyze the individual-level data and subsequently transfer the resulting means and covariance matrix of the parameters as priors into a Bayesian analysis of the ecologic 2 x 2 tables.

Sebastien Haneuse, Center for Health Studies

Title: The Combination of Ecological and Case-control Data

Ecological studies, in which data are available at the level of the group, rather than at the level of the individual, are susceptible to a range of biases due to their inability to characterize within-group variability in exposures and confounders. In order to overcome these biases, we propose a hybrid design in which ecological data are supplemented with a sample of individual-level case-control data. We develop the likelihood for this design and illustrate its benefits via simulation, both in bias reduction when compared to an ecological study, and in efficiency gains relative to a conventional case-control study. An interesting special case of the proposed design is the situation where ecological data are supplemented with case-only data. The design is illustrated using a dataset of county-specific lung cancer mortality rates in the state of Ohio from 1988.

Kosuke Imai, Princeton University

Title: Combining the individual health survey with the aggregate mortality data to estimate the disability-free life expectacy

In this talk, I show how to combine the individual health survey with the aggregate mortality data to estimate the disability-free life expectancy (DFLE) or healthy life expectancy. Robust estimation of DFLE is essential for examining whether additional years of life are spent in good health and whether life expectancy is increasing faster than the decline of disability rates. I establish the formal statistical properties of the most commonly used method and consider an extension by relaxing the required assumptions. The empirical analysis of the 1907 and 1912 U.S. birth cohorts suggests that while mortality rates remain approximately stationary, disability rates decline during this time period. (The talk will be based on my paper with Samir Soneji; ``On the Estimation of Disability-Free Life Expectancy: Sullivan's Method and Its Extension.'' Journal of the American Statistical Association, Forthcoming.)

Anna Oudin, Lund University (Sweden)

Title: Efficiency of two-phase methods with focus on a planned population-based case-control study on air pollution and stroke and some preliminary empirical results

We plan a study on exposure to air pollution and stroke in Scania, southern Sweden. We have access to individual disease status and group (area) level data on air pollution exposure for a large population sample. A smaller sub-sample will be selected in the second phase for individual-level assessment on exposure and covariates. We simulate a case-control study based on our planned study. We develop a two-phase method for this study and compare the performance of our method with the performance of other two-phase methods. In the first phase, a sub-sample of the population will be included in the study, for whom we know disease status, a group-level measure on air pollution and basic data like age and sex. In the second phase, life exposure to air pollutants and potential confounders such as smoking will be estimated with questionnaires. We then apply two-phase methods to estimate the same factors in the subjects that did not participate into the second phase. In the setting described here, our developed two-phase method had the best performance in order to improve efficiency, while adjusting for varying participation rates across areas and while using group-level exposure data.

Doug Thompson, University of Southern Maine and Dan Wartenberg, Rutgers University

Title: Additive versus Multiplicative Models in Ecologic Regression

Much research in environmental epidemiology relies on aggregate- level information on exposure to potentially toxic substances and on relevant covariates. We compare the use of additive (linear) and multiplicative (log-linear) regression models for the analysis of such data. We illustrate how both additive and multiplicative models can be fit to aggregate-level data sets in which disease incidence is the dependent variable, and contrast these results with similar models fitted to individual-level data. We find (1) that for aggregate-level data, multiplicative models are more likely than additive models to introduce bias into the estimation of rates, an effect not found with individual-level data; and (2) that under many circumstances multiplicative models reduce the precision of the estimates, an effect also not found in individual-level models. For both additive and multiplicative models of aggregate-level data, we find that, in the presence of a covariate that does not have a direct causal effect on exposure, narrow confidence interval are obtained only when two or more unmeasured antecedent factors are strongly related to the measured covariate and/or the exposure of primary substantive interest. We conclude that the comparability of fitting sufficiently specified additive or multiplicative models in studies with individual-level binary data does not carry over to studies that analyze aggregate-level information. For aggregate data, we recommend use of additive models.

Tyler VanderWeele, University of Chicago

Title: Direct and Indirect Effects for Neighborhood-level Interventions

Definitions of direct and indirect effects are given for settings in which individuals are clustered in groups or neighborhoods and in which treatments are administered at the group level. A particular intervention may affect individual outcomes both through its effect on the individual and by changing the group or neighborhood itself. Identifiability conditions are given for controlled direct effects and for natural direct and indirect effects. The interpretation of these identifiability conditions are discussed within the context of neighborhood research and multilevel modeling. The definition of direct and indirect effects requires certain stability conditions; some discussion is given as to how these stability conditions can be relaxed.

Dan Wartenberg, Rutgers University

Title: Additive versus Multiplicative Models in Ecologic Regression

Tom Webster, Boston University School of Public Health

Title: Overview: Studies that combine individual and group-level data

This talk will review ecologic bias and then provide an overview of several different types of studies that combine individual and ecologic data. In an individual-level study, one collects and analyzes information exposure (x), outcome (y) and covariates (z) for each subject. In traditional ecologic studies, we utilize group-level data for each variable, often aggregated by geography. For example, we might regress the average outcome (Y) against the average exposure (X) and average covariates (Z) across counties. It has long been known that the information loss that occurs from using fully aggregated data (XYZ) can result in severe bias relative to results obtained on the individual level (xyz). Of growing interest are studies that combine individual and ecologic data, the focus of this workshop. 1) Hybrid designs have been proposed that supplement aggregate data with samples of individual level data. 2) Epidemiologists who study environmental and occupational exposures often use ecologic measures of exposure. Are such partially ecologic (Xyz) studies subject to ecologic bias and, if so, to what degree? 3) There is growing interest in contextual studies (xyZ) in which a group level variable has an effect that is not captured by individual-level variables.

Tom Webster, Boston University School of Public Health

Title: Individual studies with ecologic measures of exposure

Epidemiologists who study environmental and occupational exposures often use ecologic measures of exposure. Are such partially ecologic (Xyz) studies subject to ecologic bias and, if so, to what degree? Studies employing ecologic exposure variables can often be viewed as individual with exposure measurement error, but this does not prevent at least some types of bias seen in purely ecologic studies. One explanation for this apparent paradox is that the exposure measurement error is special (a type of Berkson error), typically reducing exposure variance; in crude studies, this causes "bias magnification" similar to that occurring in fully ecologic studies. More generally, problems arise from loss of information regarding the joint distributions of outcome, exposure and covariates. Nevertheless, partially ecologic studies will often perform better than purely ecologic studies. Recognition of these properties can help in both the design of studies and sensitivity analysis of results.

Previous: Program

Workshop Index

DIMACS Homepage

Contacting the Center
Document last modified on November 14, 2007.