DIMACS Seminar on Math and CS in Biology


Improving the Space Efficiency of the MSA Program for Multiple Sequence Alignment


Alejandro A. Schaffer
National Center for Human Genome Research, NIH and
Department of Computer Science, Rice Uiversity


DIMACS Seminar Room 431
CoRE Building
Rutgers University


1:00 PM
Tuesday, December 12, 1995


Finding similar DNA or protein sequences is an important paradigm in the studies of evolution and gene function. I will describe a basic formulation of the multiple sequence alignment problem. This formulation is equivalent to a special case of finding single-source shortest paths on a weighted graph. The MSA program is one of the few existing programs that attempts to find exactly optimal alignments of multiple sequences of proteins or DNA. MSA was written and distributed in 1989 and it implements a variant of Dijkstra's algorithm. Two implementation difficulties are the introduction of gap costs in the alignment and the fact that the underlying graph has exponential size. We have made substantial improvements in the space and time usage of MSA. The improvements make feasible a variety of problem instances that were not computationally feasible previously. The new MSA is available directly from an ftp site at Rice University in Houston and indirectly through a WWW service at Washington University in St. Louis.

(This talk represents joint work with Sandeep K. Gupta and is based on earlier work of John D. Kececioglu)

If you would like to meet with the speaker, please contact Dannie Durand (durand@cs.princeton.edu)

Document last modified on December 5, 1995