DIMACS Seminar on Math and CS in Biology


On the Complexity of String Folding


Teresa Przytycka
University of Maryland


CoRE Building, Conference Room 433
Busch Campus, Rutgers University


11:00 AM
Tuesday, April 2, 1996


The motivation for the string-folding problems considered during this talk lies in computational biology. Prediction of the three-dimensional structure of a protein from its known linear sequence of amino acids is an important practical open problem, which seems to be extremely challenging. A natural approach is to look for a spatial configuration achieving a minimum free energy level.

While most people would expect that finding a minimum energy configuration would be computationally intractable, previous results to this effect are very limited.

We prove an NP-completeness result for a much-simplified model, in which we attempt to capture rather more of the essential character of the protein-folding problem. The protein molecule is represented by a string of symbols, a bond can be made only between a pair of identical symbols, and we seek an embedding of the given string in a grid so as to maximise the number of pairs of matching symbols at adjacent grid points. This version of the folding problem involves a mixture of combinatorial, geometric and topological considerations.

Hart and Istrail have given an approximation algorithm for the ``hydrophobic-hydrophilic'' model over the same grids. This corresponds to our model, but with a binary alphabet in which the only matches counted are between adjacent 1's.

This is joint work with Mike Paterson.

Document last modified on March 28, 1996