DIMACS TR: 2005-39
An evaluation of the edit-distance-with-moves similarity metric for comparing genetic sequences
Authors: Shiri Azenkot, Tzu-Yi Chen and Graham Cormode
ABSTRACT
We describe the first known implementation of an approximation algorithm for
the string edit distance with moves similarity metric. This is the first
algorithm to consider nontrivial alignment and run in substantially
sub-quadratic time [2]. Extensive experimentation demonstrates that the
algorithm produces a good approximation for the edit distance with moves
metric, especially on strings of length 500B to 10KB. We also found that the
algorithm has high potential for use in computational biology. When comparing
texts of genetic sequences, our algorithm outperforms the q-grams heuristic
in predicting results of the Smith-Waterman algorithm. Finally, we propose
additional application areas for our implementation.
Paper Available at:
ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/2005/2005-39.ps.gz
DIMACS Home Page