### Short Superstrings and the
Structure of Overlapping Strings

**Chris Armen and Clifford Stein **
Department of Computer Science

Dartmouth College, Hanover, NH 03755

Given a collection *S* of strings over some alphabet, a
superstring *A* is a string which contains each of the elements
of *S* as a substring; that is, for each string *s* in
*S*, there is a block of |*s*| consecutive characters in
*A* which match *s* exactly. In the shortest superstring
problem, *A* is further required to be of minimum length.

This problem can be viewed as an abstraction of the sequence assembly
problem and as such has attracted considerable recent attention. It is
NP-hard, and several constant-factor approximation algorithms have
been proposed.

In this talk we describe our 2.75-approximation algorithm for the
superstring problem. This result not only represents a significant
improvement in the best known approximation ratio for the problem, but
also reveals a great deal about the structure of overlapping strings.
While previous algorithms have grown increasingly sophisticated, their
gains have come in a fundamentally graph-theoretic setting. In this
sense they have addressed a more general problem than the one at
hand. We believe that our approach, because it deals specifically with
the structure of strings with large amounts of overlap, may be better
able to address some of the feature of the underlying sequence
assembly problem.

Program

DIMACS Homepage

Contacting the Center

Document last modified on March 28, 2000.