DIMACS Working Group on Adverse Event/Disease Reporting, Surveillance, and Analysis I
DIMACS Subgroup on Adverse Event/Disease Reporting, Surveillance, and Analysis
Title: Simulation of outbreak signals for evaluation of multiple data source detectors
Evaluation of detector performance requires test data with accurately labelled outbreak intervals. Creation of test data by expert mark-up of authentic data is time-consuming and becomes increasingly difficult as the dimensionality of the data increases. Simulation of outbreak signals is an alternative approach to generating test data. Identifying the appropriate level of detail for an outbreak simulation and deciding how to model that detail is a diffcult problem. We discuss issues in generating outbreak signals for "injection" into multiple authentic data sources.
Title: Monitoring of Multivariate Data in ESSENCE Biosurveillance Systems
The Electronic Surveillance System for the Early Notification of Community-based Epidemics (ESSENCE) has been enhanced by the Department of Defense Global Emerging Infection System and the Johns Hopkins University Applied Physics Laboratory to monitor both military and civilian data streams for the onset of disease outbreaks. This system monitors time series of multiple data sources stratified by covariates such as patient location and diagnosis type. Several factors drive the need to manage multiple time series: Multiple, disparate data sources are available; time series for a data source may be divided among political regions or treatment facilities; and it is often preferable to stratify the available data to produce separate series for each syndrome group, product group, etc. These circumstances are increasingly intertwined in ESSENCE systems as the surveillance areas and number of available data sources grow. Detection algorithms flag anomalies that are represented as alerts for followup screening by epidemiologists. The screening occurs on multiple jurisdictional levels, both military and civilian. As the complexity of these systems grows, it is important to retain sensitivity while avoiding excessive alerts resulting from multiple testing and from the mismatch between epidemiological and statistical significance.
For alerting based on purely temporal data streams, we have implemented combinations of regression modeling and statistical process control to address the alerting performance issues. Multivariate, multiple univariate, and hybrid strategies are used. Modifications of the multivariate methods avoid oversensitivity to irrelevant changes in covariance among the data streams. Several strategies have been applied to fuse the outputs of algorithms applied to separate data streams. For the detection of significant spatiotemporal clusters, we have generalized the scan statistic approach widely used in Martin Kulldorff's SaTScan software to treat multiple data sources so as to avoid masking of a signal in one data source by another source with much larger variance.
Multivariate and multiple univariate alerting strategies proved sensitive and robust in a 5-city retrospective exercise including several authentic data streams. Both approaches gave timely alerts for a set of outbreaks identified by a group of medical epidemiologists. The application of a Bayes belief net to the separate algorithm outputs improved the alerting timeliness in some contexts.
Biosurveillance in this complex, stratified data environment may be managed with the required sensitivity using algorithms adapted to characteristics and interrelationships of the chosen data streams. As these systems grow, a hybrid suite of process control algorithms in a Bayes net framework is a viable model for the versatility that will be needed.
Title: Power Evaluation of the Spatial Scan Statistic for Multiple Data Streams
Multiple data streams of possibly different magnitudes can be combined in a syndromic surveillance system for the early detection of disease outbreaks. In this context it is important to evaluate the power of spatial cluster detection methods. We present the results of simulations for some possible schemes using the spatial scan statistic and show that the power is significantly increased compared to single data stream systems. This happens because we are adding the signals from two separated sources, and that increases the probability of detection. The LLR adding scheme seems to be the better option in most simulations, followed by the maximum LLR scheme. These same ideas are currently being developed for the space-time scan statistics.
Burkom, H. S., Biosurveillance Applying Scan Statistics with Multiple, Disparates Data Sources, Journal of Urban Health, Vol. 80, No. 2, Supplement 1, 2003
Kulldorff, M., A Spatial Scan Statistic, Comm. Statist. Theory Meth., 26(6), 1481-1496, 1997
Title: Multiplicity and sequential testing
Group sequential methods are designed to deal with the problems caused by repeated analyses of accumulating data. Further multiplicities arise when there are several endpoints or three or more treatments are under investigation.
In the case of multiple endpoints, the initial concern is to avoid over-interpreting results on selected outcomes. However, there is potential for efficiency gains when mutually supportive results are combined from separate endpoints. The requirements may differ in other settings, such as the combined analysis of efficacy and safety endpoints in a clinical trial: a treatment must do well by both criteria and there is little scope for trading success on one criterion against poor performance on the other.
I shall describe methodology for monitoring multiple endpoints developed in the context of clinical trials. I shall try to identify aspects of these methods which are peculiar to clinical trials and to highlight features which are likely to be of value generically in other applications.
Title: Borrowing methodology from industry: Methods of Statistical Process Control for Public Health Surveillance
Industry routinely monitors production processes to detect possible ``out-of-control'' situations. The most common control chart techniques are the Shewhart, CUSUM (Cumulative Sums Chart), and EWMA (Exponentially Weighted Moving Average) Charts. One chart may be more powerful than the others, depending upon the nature of the aberration (e.g., single, large outlier; abrupt change in level; slow but steady trend; etc.) This presentation will include a discussion of the use and power of these conventional charts, as well as examples of them and their multivariate analogs. A statistical graphical display of disease incidence data from CDC's National Notifiable Disease Surveillance System (NNDSS) will also be presented.
Title: Visualization Frameworks to Detection the Presence of Unusual Structure in High-Dimensional Data
This talk will explore various visualization methods to facilitate the detection of outliers, clusters, inter-class relationships, and intra-class relationships. Some of the methods to be discussed include data images of the inter-point distance matrix for outlier detection and cluster detection, parallel coordinates for cluster assessment, and minimal spanning tree methods for the identification of inter-class and intra-class relationships. Some of the methods will be illustrated with data taken from text classification and clustering studies.