DIMACS Seminar on Math and CS in Biology


On the Complexity and Approximation of Syntenic Distance


Bhaskar DasGupta
Rutgers University, Camden


Seminar Room 431, CoRE Building
Rutgers University.


2:00 p.m.
Thursday, November 7, 1996

(Joint work with T. Jiang, M. Li, S. Kannan and Z. Sweedyk.)

The paper studies the computational complexity and approximation algorithms for a new evolutionary distance between multi-chromosomal genomes introduced recently by Ferretti, Nadeau and Sankoff. Here, a chromosome is represented as a set of genes and a genome is a collections of chromosomes. The syntenic distance between two genomes is defined as the minimum number of translocations, fusions and fissions required to transform one genome into the other. We prove that computing the syntenic distance is NP-hard and give a simple approximation algorithm with ratio $2$. The question of how to improve the approximation ratio is also considered, and a tight connection between finding large ``balanced'' independent sets in bipartite graphs and such an improvement is shown. For the case when an upper bound $d$ on the syntenic distance is known, we show that an an optimal syntenic sequence can be found in $O(n^2\cdot 2^{O(d^2)})$ time. Next, we show that if the set of operations for transforming a genome is significantly restricted, we can nevertheless find a solution that performs at most $O( \log d)$ additional moves, where $d$ is the number of moves performed by the unrestricted optimum. This result should help in the design of approximation algorithms. Finally, we investigate the median problem: Given three genomes, construct a genome minimizing the total syntenic distance to the three given genomes. The problem has application in the inference of phylogenies based on the syntenic distance. We prove that the problem is NP-hard and design an efficient polynomial time approximation algorithm with ratio $4+\epsilon$ for any constant $\epsilon>0$.

Document last modified on October 30, 1996