DIMACS TR: 97-18

On the Linear-Cost Subtree-Transfer Distance between Phylogenetic Trees

Authors: Bhaskar DasGupta, Xin He, Tao Jiang, Ming Li and John Tromp


Different phylogenetic trees for the same group of species are often produced either by procedures that use diverse optimality criteria [14] or from different genes [10] in the study of molecular evolution. Comparing these trees to find their similarities and dissimilarities (i.e. distance) is thus an important issue in computational molecular biology. Several distance metrics including the nearest neighbor interchange (nni) distance and the subtree-transfer distance have been proposed and extensively studied in the literature. This article considers a natural extension of the subtree-transfer distance, called the linear-cost subtree-transfer distance, and studies the complexity and efficient approximation algorithms for this distance as well as its relationship to the nni distance. The linear-cost subtree-transfer model seems more logical than the (unit-cost) subtree-transfer model in some applications. The following is a list of our results.

1. The linear-cost subtree-transfer distance is in fact identical to the nni distance on unweighted phylogenies.

2. There is an algorithm to compute an optimal linear-cost subtree-transfer sequence between unweighted phylogenies in O(n^2 log n + n 2^{O(d)}) time, where d denotes the linear-cost subtree-transfer distance. Such an algorithm is usual when d is small.

3. Computing the linear-cost subtree-transfer distance between two weighted phylogenetic trees is NP-hard, provided we allow multiple leaves of a tree to share the same label (i.e. the trees are not necessarily uniquely labeled).

4. There is an efficient approximation algorithm for computing the linear-cost subtree-transfer distance between weighted phylogenies with performance ratio 2.

Paper Available at: ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/1997/97-18.ps.gz

DIMACS Home Page