## DIMACS TR: 2001-26

## The String Edit Distance Matching Problem with Moves

### Authors: Graham Cormode and S. Muthukrishnan

**
ABSTRACT
**

The edit distance between two strings S and R
is defined to be the minimum number of character inserts,
deletes and changes needed to convert R to S.
Given a text string t of length n, and a pattern string p
of length m, informally, the string edit distance matching problem is to
compute the smallest edit distance between p and
substrings of t. A well known dynamic programming algorithm
takes time O(nm) to solve this problem, and it is an important
open problem in Combinatorial Pattern Matching to
significantly improve this bound.

We relax the problem so that (a) we allow an additional operation, namely,
substring moves, and (b) we approximate the string edit distance
upto a factor of O(log n log* n).(log*n is the number of times log
function is applied to n to produce a constant.)
Our result is a neat linear time deterministic algorithm for this version
of the problem. This is the first known significantly subquadratic
algorithm for a string edit distance problem in which the distance
involves nontrivial alignments. Our results are obtained by embedding
strings into L_1 vector space using a simplified parsing technique we call
Edit Sensitive Parsing (ESP). This embedding is approximately distance
preserving, and we show many applications of this embedding to string
proximity problems including nearest neighbors, outliers, and streaming
computations with strings.

Paper Available at:
ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/2001/2001-26.ps.gz

DIMACS Home Page