# DIMACS Theory of Computing Seminar

## Title:

On the Approximability of Numberical Taxonomy (Fitting Distances by Tree Metrics)

## Speaker:

- Martin Farach
- Rutgers University

## Place:

- CoRE Building, Room 431
- Busch Campus, Rutgers University

## Time:

- 4:30 PM
- Thursday, October 12, 1995

## Abstract:

One of the most common methods for clustering numeric data involves
fitting the data to a tree metric, which is defined by a
weighted tree spanning the points of the metric, the distance between
two points being the sum of the weights of the edges of the path
between them. Not surprisingly, this problem, the so-called {\em
Numerical Taxonomy} problem, has received a great deal of attention.
Fitting distances by trees is an important problem in many areas. For
example, evolutionary biologists often compare the DNA sequences of
pairs of species, and thus get an estimate of the evolutionary time
which has elapsed since the species separated by a speciation event.
A table of pairwise distances is thus constructed. The problem is
then to reconstruct the underlying evolutionary tree. Dozens of
heuristics for this problem appear in the literature every year.
We consider the problem of fitting an n by n distance matrix D
by a tree metric T. Let p be the distance to the closest tree
metric, that is, p is the minimum over T of the maximum difference
between T and D. First we present a quadratic time
algorithm for finding an additive tree T such that
whose distance from D is at most 3p. Second we show that it is
NP-hard to find a tree T such that is distance (9/8)p from D.
This paper presents the first algorithm for this problem with
a performance guarantee.

Joint work with Richa Agarwala, Vineet Bafna, Babu Narayanan, Mike
Paterson and Mikkel Thorup.

Document last modified on October 4, 1995