## DIMACS TR: 96-19

## The number of nucleotide sites needed to accurately reconstruct
large evolutionary trees

### Authors: Mike Steel, Laszlo A. Szekely, Peter L. Erdos

**
ABSTRACT
**

Biologists seek to reconstruct evolutionary trees for increasing number of
species, $n$, from aligned genetic sequences. How fast the sequence length
$N$ must grow, as a function of $n$, in order to accurately recover the
underlying tree with probability $1-\epsilon$, if the sequences evolve
according to simple stochastic models of nucleotide substitution? We show
that for a certain model, a reconstruction method exists for which the
sequence length $N$ can grow surprisingly slowly with $n$ (sublinearly for
a wide range of parameters, and even as a power of $\log n$ in a narrow
range, which roughly meets the lower bound from information theory). By
contrast a more traditional technique (maximum compatibility) provably
requires $N$ to grow faster than linearly in $n$. Our approach is based on
a new, and computationally efficient approach for reconstructing phylogenetic
trees from aligned DNA sequences.

Paper Available at:
ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/1996/96-19.ps.gz

DIMACS Home Page