DIMACS Workshop on Integration of Diverse Biological Data

June 21 - 22, 2001
DIMACS Center, Rutgers University, Piscataway, NJ

Andrea Califano (co-chair), First Genetic Trust, acalifano@firstgenetic.net
Conrad Gilliam (co-chair), Columbia University, tcg1@columbia.edu
Fred S. Roberts, Rutgers University, froberts@dimacs.rutgers.edu
Presented under the auspices of the Special Focus on Computational Molecular Biology.

This workshop will focus on combinatorial algorithms and probabilistic models for the analysis and cross-annotation of biological data in diverse databases. The rapid accumulation of biological data has led to one of history's largest and growing collections of unstructured, loosely related databases. Most of the useful relationships in this haphazard collection of data are the direct result of a laborious task of manual annotation and experimentation.

At the current pace at which new biological information becomes available, the ability to automatically cluster, classify, and annotate it across the traditional boundaries of individual databases is becoming an increasingly critical need. (See Macauley, Wang, and Goodman [1998].) For instance, DNA sequences for promoter and enhancer regions, structural motifs in transcription factors, metabolic pathway databases, and gene expression analysis are all tightly bound and interconnected. However, they tend to be studied in isolation. When this happens, clustering techniques are often less effective because in the absence of additional constraints they have to deal directly with the high dimensionality of the solution space. Functional clustering of protein sequences, for instance, can help reduce the complexity of structural clustering and vice versa. Analogously, functional clustering in the gene expression domain has been shown to significantly reduce the complexity of promoter region analysis.

We will consider high-dimensional combinatorial algorithms and probabilistic models central to DM for the analysis, clustering, and classification of complex data patterns that can be used to integrate diverse information from many biological sources. Motivating approaches include the work of Roth, Hughes, Estep and Church [1998] tying together mRNA monitoring and transcription factors; of Bystroff and Baker [1998] using 1D database information to improve 3D fold recognition; of Eckman, et al. [1998] on large-scale diverse data; and of Karp and Paley [1996] integrating genomic data and metabolic pathways data.

Next: Call for Participation
Workshop Index
DIMACS Homepage
Contacting the Center
Document last modified on April 5, 2001.