DIMACS TR: 96-53

On the Design of Optimization Criteria for Multiple Sequence Alignment

Authors: Dannie Durand and Martin Farach


Multiple sequence alignment (MSA) is important in functional, structural and evolutionary studies of sequence data. Much research has focussed on posing MSA as an optimization problem, and several optimization criteria have been explored. In this paper, we discuss biological and mathematical problems that arise in cost function design for the multiple sequence alignment problem. In particular, we focus on tree alignment, which is often viewed as the most ``biological'' of the rigorous approaches to MSA. We point out several important pitfalls in current optimization approaches to MSA and identify characteristics for good cost function design. We address some extra design issues specific to approximation algorithms. We hope these ideas will lead to future research on a biologically realistic and mathematically rigorous approach to MSA.

