J. Abello and G. Cormode, editors. Discrete Methods in Epidemiology, volume 70 of DIMACS. AMS, 2006.

In general terms, epidemiology deals with populations rather than individuals. One of its goals is to study the frequency of occurrences of health related events. It has a major but not exclusive concern with causes and determinants of disease patterns in populations. The premise is that a systematic investigation of different populations can identify causal and preventive factors. Epidemiology is an observational rather than an experimental science. Sample questions take the form of:

We have observed that occurrence measures, causal inference and study designs play prominent roles in the daily endeavors of a typical epidemiologist. Descriptive and analytical epidemiology are two overlapping flavors of this discipline.

Descriptive epidemiology attempts to describe patterns of disease according to spatial and temporal information about the members of a population. These patterns are described by tabulations or summaries of surveys and polls or by parametric or non-parametric population models. Models are in general global descriptions of the major part of a data set. Patterns on the other hand are local features of the data that can be described by association rules, mode or gaps in density functions, outliers, inflection points in regressions, symptom clusters, geographic hot spots, etc. Some epidemiologists appear more interested in local patterns rather than in global structure. This raises questions of how “realistic” certain patterns are.

Analytical Epidemiology attempts to explain and predict the state of a population's health. A typical goal is to summarize the relationship between exposure and disease incidence by comparing two measures of disease frequency. These comparisons may be affected by chance, bias and by the presence or absence of an effect. This explains naturally why statistical methods play a major role in Epidemiology since bias is a central preoccupation of its practitioners. Bias means a systematic error that results in an incorrect or invalid estimate of the measure of association. This can create or mask associations. Selection and information bias are two of the main bias types. In particular, selection shall be independent of exposure if the purpose of the study is to explain the relationship between exposure and disease occurrence. In summary, one of the central themes in analytical epidemiology is to understand the roles of bias, chance and real effect in the understanding of populations health.

To evaluate the role of chance, statistical hypothesis testing and estimation appear to be the tools of choice. On the other hand, generative models offer a way to describe infectious disease dynamics. Since disease patterns are of primary interest, data mining algorithms and detection of rules for pattern formation have a lot to offer. Classification and taxonomies are useful tools to develop predictive models. In general we believe that some questions addressed by epidemiologists benefit from viewing them in a mathematical and algorithmic context. This volume is a first attempt to bridge the gap between the two communities. Its main emphasis is on discrete methods that have successfuly addressed some epidemiological question. We begin by providing introductory chapters, on some of the key methods from discrete data mining, by a selection of researchers in this area; and on descriptive epidemiology by Schneider. These collect, in a digested form, what we believe are among the most potentially useful concepts in data mining and epidemiology.

Next there are two chapters reporting work in epidemiology that suggest a discrete, analytical approach: Shannnon on challenges in molecular data analysis, and Hirschman and Damianos on a system for monitoring news wires for indications of disease outbreaks. The remainder of the volume draws out further some of the key areas in the intersection between epidemiology and discrete methods. The technique of formal concept analysis, and the amazing depth of mathematical structure that arises from it is explored in chapters by Ozonoff, Pogel and Hannan, and Abello and Pogel. The dynamics of disease transmission can be modeled in a variety of ways, but often involves setting up systems of differential equations to model the ebb and flow of infection, as demonstrated by Desai, Boily, Mâsse and Anderson, and Vázquez, in the context of quite different problems. Eubank, Kumar, Marathe, Srinivasan and Wang study massive interaction graphs and give results by a combination of combinatorial methods and simulation; Abello and Capalbo focus on properties of graphs generated by an appropriate random model; while Hartke takes a combinatorial model of disease spread on tree graphs. Finally, we see two applications of Support Vector Machines to epidemiological data sets, from Li, Muchnik and Schneider (using breast cancer data from the SEER database) and from Fradkin, Muchnik, Hermans and Morgan (using data on disease in chickens). Some other potential areas of interest that we have not touched in this collection relate to patient confidentiality, coding and cryptography and multiscale inference.

bib | Alternate Version | .pdf ] Back

This file was generated by bibtex2html 1.92.