## DIMACS TR: 97-18

## On the Linear-Cost Subtree-Transfer Distance between Phylogenetic
Trees

### Authors: Bhaskar DasGupta, Xin He, Tao Jiang, Ming Li and John Tromp

**
ABSTRACT
**

Different phylogenetic trees for the same group of species are often produced
either by procedures that use diverse optimality criteria [14]
or from different genes [10] in the study of molecular evolution.
Comparing these trees to find their similarities and dissimilarities
(i.e. distance) is thus an important issue in computational
molecular biology. Several distance metrics including the
nearest neighbor interchange (nni) distance and the subtree-transfer
distance have been proposed and extensively studied in the literature.
This article considers a natural extension of the subtree-transfer distance,
called the linear-cost subtree-transfer distance, and studies the complexity
and efficient approximation algorithms for this distance as well as its
relationship to the nni distance. The linear-cost subtree-transfer model
seems more logical than the (unit-cost) subtree-transfer model in some
applications. The following is a list of our results.
1. The linear-cost subtree-transfer distance is in fact
identical to the nni distance on unweighted phylogenies.

2. There is an algorithm to compute an optimal linear-cost
subtree-transfer sequence between unweighted phylogenies in
O(n^2 log n + n 2^{O(d)}) time, where d denotes the linear-cost
subtree-transfer distance. Such an algorithm is usual when d is small.

3. Computing the linear-cost subtree-transfer distance between two weighted
phylogenetic trees is NP-hard, provided we allow multiple leaves of a tree
to share the same label (i.e. the trees are not necessarily
uniquely labeled).

4. There is an efficient approximation algorithm for computing the
linear-cost subtree-transfer distance between weighted phylogenies
with performance ratio 2.

Paper Available at:
ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/1997/97-18.ps.gz

DIMACS Home Page