DIMACS TR: 2003-05
Fast Optimal Genome Tiling with Applications to Microarray Design and Homology Search
Authors: Piotr Berman, Paul Bertone, Bhaskar DasGupta, Mark Gerstein, Ming-Yang Kao and Michael Snyder
ABSTRACT
In this paper we consider several variations of the
following basic tiling problem: given a sequence of real numbers
with two size bound parameters, we want to find a set of tiles of
maximum total weight such that each tiles
satisfies the size bounds. A solution to this problem is important to a
number
of computational biology applications such as selecting genomic DNA
fragments for PCR-based amplicon microarrays and performing homology
searches with long sequence queries. Our goal is to design efficient
algorithms
with linear or near-linear time and space in the normal range of
parameter
values for these
problems. For this purpose, we first discuss the solution to a basic
online interval maximum problem via a sliding window approach and show
how to use this solution in a non-trivial manner for many of the tiling
problems introduced. We also discuss NP-hardness results and
approximation
algorithms for generalizing our basic tiling problem to higher
dimensions.
Finally, computational results from applying our tiling algorithms to
genomic sequences of five model eukaryotes are reported.
Paper Available at:
ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/2003/2003-05.ps.gz
DIMACS Home Page