DIMACS Seminar on Math and CS in Biology
Title:
Improving the Space Efficiency of the MSA Program for Multiple Sequence Alignment
Speaker:
- Alejandro A. Schaffer
- National Center for Human Genome Research, NIH and
- Department of Computer Science, Rice Uiversity
Place:
- DIMACS Seminar Room 431
- CoRE Building
- Rutgers University
Time:
- 1:00 PM
- Tuesday, December 12, 1995
Abstract:
Finding similar DNA or protein sequences is an important paradigm in
the studies of evolution and gene function. I will describe a basic
formulation of the multiple sequence alignment problem. This
formulation is equivalent to a special case of finding single-source
shortest paths on a weighted graph. The MSA program is one of the few
existing programs that attempts to find exactly optimal alignments of
multiple sequences of proteins or DNA. MSA was written and
distributed in 1989 and it implements a variant of Dijkstra's
algorithm. Two implementation difficulties are the introduction of
gap costs in the alignment and the fact that the underlying graph has
exponential size. We have made substantial improvements in the space
and time usage of MSA. The improvements make feasible a variety of
problem instances that were not computationally feasible previously.
The new MSA is available directly from an ftp site at Rice University
in Houston and indirectly through a WWW service at Washington
University in St. Louis.
(This talk represents joint work with Sandeep K. Gupta and is based
on earlier work of John D. Kececioglu)
If you would like to meet with the speaker, please contact Dannie Durand
(durand@cs.princeton.edu)
Document last modified on December 5, 1995