DIMACS TR: 2004-46
Tight Bounds on Deterministic Seed Design
Authors: Martin Farach-Colton, Gad M. Landau, S. Cenk Sahinalp and Dekel Tsur
ABSTRACT
Flitering is a standard technique for fast approximate string matching in
practice. In Filtering, a quick first step is used to rule out almost all
positions of a text as possible starting positiions for a pattern. In the
followup step, a slow method is used to verify or eliminate each remaining
position. The running time of such a method depends largely on the quality
of the filtering step, as measured by the number of false positives it yields.
A spaced seed is a recently introduced type of filter that allows gaps in
the pattern to be searched for. Spaced seeds promise to yield a much lower
false positive rate, and thus have been extensively studied, though heretofore
only heuristically.
In this paper, we show how to design spaced seeds with no false negatives
optimally.
Paper Available at:
ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/2004/2004-46.ps.gz
DIMACS Home Page