DIMACS Working Group on Adverse Event/Disease Reporting, Surveillance, and Analysis II

Second Meeting: February 19 - 20, 2004
DIMACS Center, CoRE Building, Rutgers University

Donald Hoover, Rutgers University, Statistics, drhoover@stat.rutgers.edu
David Madigan, Rutgers University, Statistics, madigan@stat.rutgers.edu
Henry Rolka, CDC, hrr2@cdc.gov
Presented under the auspices of the of the Special Focus on Computational and Mathematical Epidemiology.

DIMACS Working Group on Adverse Event/Disease Reporting, Surveillance, and Analysis I

DIMACS Subgroup on Adverse Event/Disease Reporting, Surveillance, and Analysis


David Buckeridge, Stanford University

Title: Simulation of outbreak signals for evaluation of multiple data source detectors

Evaluation of detector performance requires test data with accurately labelled outbreak intervals. Creation of test data by expert mark-up of authentic data is time-consuming and becomes increasingly difficult as the dimensionality of the data increases. Simulation of outbreak signals is an alternative approach to generating test data. Identifying the appropriate level of detail for an outbreak simulation and deciding how to model that detail is a diffcult problem. We discuss issues in generating outbreak signals for "injection" into multiple authentic data sources.

Howard Burkom, Jeffrey Lin, Andrew Feldman, The Johns Hopkins University Applied Physics Laboratory and
Yevgeniy Elbert, Walter Reed Army Institute for Research

Title: Monitoring of Multivariate Data in ESSENCE Biosurveillance Systems

The Electronic Surveillance System for the Early Notification of Community-based Epidemics (ESSENCE) has been enhanced by the Department of Defense Global Emerging Infection System and the Johns Hopkins University Applied Physics Laboratory to monitor both military and civilian data streams for the onset of disease outbreaks. This system monitors time series of multiple data sources stratified by covariates such as patient location and diagnosis type. Several factors drive the need to manage multiple time series: Multiple, disparate data sources are available; time series for a data source may be divided among political regions or treatment facilities; and it is often preferable to stratify the available data to produce separate series for each syndrome group, product group, etc. These circumstances are increasingly intertwined in ESSENCE systems as the surveillance areas and number of available data sources grow. Detection algorithms flag anomalies that are represented as alerts for followup screening by epidemiologists. The screening occurs on multiple jurisdictional levels, both military and civilian. As the complexity of these systems grows, it is important to retain sensitivity while avoiding excessive alerts resulting from multiple testing and from the mismatch between epidemiological and statistical significance.


For alerting based on purely temporal data streams, we have implemented combinations of regression modeling and statistical process control to address the alerting performance issues. Multivariate, multiple univariate, and hybrid strategies are used. Modifications of the multivariate methods avoid oversensitivity to irrelevant changes in covariance among the data streams. Several strategies have been applied to fuse the outputs of algorithms applied to separate data streams. For the detection of significant spatiotemporal clusters, we have generalized the scan statistic approach widely used in Martin Kulldorff's SaTScan software to treat multiple data sources so as to avoid masking of a signal in one data source by another source with much larger variance.


Multivariate and multiple univariate alerting strategies proved sensitive and robust in a 5-city retrospective exercise including several authentic data streams. Both approaches gave timely alerts for a set of outbreaks identified by a group of medical epidemiologists. The application of a Bayes belief net to the separate algorithm outputs improved the alerting timeliness in some contexts.


Biosurveillance in this complex, stratified data environment may be managed with the required sensitivity using algorithms adapted to characteristics and interrelationships of the chosen data streams. As these systems grow, a hybrid suite of process control algorithms in a Bayes net framework is a viable model for the versatility that will be needed.

Luiz Duczmal, Harvard Medical School and Harvard Pilgrim Health Care and Universidade Federal de Minas Gerais, Brazil
Martin Kulldorff, Harvard Medical School
Farzad Mostashari, New York City Department of Health

Title: Power Evaluation of the Spatial Scan Statistic for Multiple Data Streams

Multiple data streams of possibly different magnitudes can be combined in a syndromic surveillance system for the early detection of disease outbreaks. In this context it is important to evaluate the power of spatial cluster detection methods. We present the results of simulations for some possible schemes using the spatial scan statistic and show that the power is significantly increased compared to single data stream systems. This happens because we are adding the signals from two separated sources, and that increases the probability of detection. The LLR adding scheme seems to be the better option in most simulations, followed by the maximum LLR scheme. These same ideas are currently being developed for the space-time scan statistics.


Burkom, H. S., Biosurveillance Applying Scan Statistics with Multiple, Disparates Data Sources, Journal of Urban Health, Vol. 80, No. 2, Supplement 1, 2003

Kulldorff, M., A Spatial Scan Statistic, Comm. Statist. Theory Meth., 26(6), 1481-1496, 1997

Chris Jennison, University of Bath, England

Title: Multiplicity and sequential testing

Group sequential methods are designed to deal with the problems caused by repeated analyses of accumulating data. Further multiplicities arise when there are several endpoints or three or more treatments are under investigation.

In the case of multiple endpoints, the initial concern is to avoid over-interpreting results on selected outcomes. However, there is potential for efficiency gains when mutually supportive results are combined from separate endpoints. The requirements may differ in other settings, such as the combined analysis of efficacy and safety endpoints in a clinical trial: a treatment must do well by both criteria and there is little scope for trading success on one criterion against poor performance on the other.

I shall describe methodology for monitoring multiple endpoints developed in the context of clinical trials. I shall try to identify aspects of these methods which are peculiar to clinical trials and to highlight features which are likely to be of value generically in other applications.

Karen Kafadar, University of Colorado

Title: Borrowing methodology from industry: Methods of Statistical Process Control for Public Health Surveillance

Industry routinely monitors production processes to detect possible ``out-of-control'' situations. The most common control chart techniques are the Shewhart, CUSUM (Cumulative Sums Chart), and EWMA (Exponentially Weighted Moving Average) Charts. One chart may be more powerful than the others, depending upon the nature of the aberration (e.g., single, large outlier; abrupt change in level; slow but steady trend; etc.) This presentation will include a discussion of the use and power of these conventional charts, as well as examples of them and their multivariate analogs. A statistical graphical display of disease incidence data from CDC's National Notifiable Disease Surveillance System (NNDSS) will also be presented.

Jeff Solka, George Mason University

Title: Visualization Frameworks to Detection the Presence of Unusual Structure in High-Dimensional Data

This talk will explore various visualization methods to facilitate the detection of outliers, clusters, inter-class relationships, and intra-class relationships. Some of the methods to be discussed include data images of the inter-point distance matrix for outlier detection and cluster detection, parallel coordinates for cluster assessment, and minimal spanning tree methods for the identification of inter-class and intra-class relationships. Some of the methods will be illustrated with data taken from text classification and clustering studies.

Previous: Program
Workshop Index
DIMACS Homepage
Contacting the Center
Document last modified on February 5, 2004.