97-72: A few logs suffice to build (almost) all trees (II)

DIMACS TR: 97-72

A few logs suffice to build (almost) all trees (II)

Authors: P. L. Erdos, M. A. Steel, L. A. Szekely, and T. J. Warnow

ABSTRACT

Inferring evolutionary trees is an interesting and important problem in biology that is very difficult from a computational point of view as most associated optimization problems are NP-hard. Although it is known that many methods are provably statistically consistent (i.e. the probability of recovering the correct tree converges on 1 as the sequence length increases), the actual rate of convergence for different methods has not been well understood. In a recent paper we introduced a new method for reconstructing evolutionary trees called the Dyadic Closure Method (DCM), and we showed that DCM has a very fast convergence rate. DCM runs in O(n^5 log n) time, where n is the number of sequences, so although it is polynomial it has computational requirements that are potentially too large to be of use in practice. In this paper we present another tree reconstruction method, the Witness-Antiwitness Method, or WAM. WAM is significantly faster than DCM, especially on random trees, and converges at the same rate as DCM. We also compare WAM to other methods used to reconstruct trees, including Neighbor Joining (possibly the most popular method among molecular biologists), and new methods introduced in the computer science literature.

This researched started when the authors enjoyed the hospitality of DIMACS in 1995.

Paper Available at: ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/1997/97-72.ps.gz

DIMACS Home Page