DIMACS Seminar on Math and CS in Biology


Learning-based Algorithms for Protein Motif Recognition


Mona Singh
Princton University


DIMACS Seminar Room 431
CoRE Building
Rutgers University


11:00 AM
Monday, November 20, 1995


One of the most important problems in computational biology is that of predicting how a protein will fold in three dimensions when we only have access to its one-dimensional amino acid sequence. An important first step in tackling the protein folding problem is a solution to the structural motif recognition problem: given a known local three-dimensional structure, or motif, determine whether this motif occurs in a given amino acid sequence, and if so, in what positions.

We present a learning algorithm that improves existing methods for recognizing protein structural motifs. Our algorithm is an iterative method that exploits randomness and statistical techniques to obtain good performance. Our algorithm is particularly effective in situations where large numbers of examples of the motif are not known. These are precisely the situations that pose significant difficulties for previously known methods.

We have implemented our algorithm and we demonstrate its performance on the coiled coil motif. We test our program Learn-Coil on the domain of 3-stranded coiled coils and subclasses of 2-stranded coiled coils. We show empirically that for these motifs, our method overcomes the problem of limited data.

(Joint work with Bonnie Berger.)

11/28: Fred Hughson, Princeton, Chemistry,
       On protein structure.

12/5:  Doug Deutschman, Cornell, Ecology, 
       Max likelihood models of forest ecology.

12/12: Alex Schaffer, NIH

Document last modified on November 8, 1995