DIMACS TR: 2003-05

Fast Optimal Genome Tiling with Applications to Microarray Design and Homology Search



Authors: Piotr Berman, Paul Bertone, Bhaskar DasGupta, Mark Gerstein, Ming-Yang Kao and Michael Snyder

ABSTRACT

In this paper we consider several variations of the following basic tiling problem: given a sequence of real numbers with two size bound parameters, we want to find a set of tiles of maximum total weight such that each tiles satisfies the size bounds. A solution to this problem is important to a number of computational biology applications such as selecting genomic DNA fragments for PCR-based amplicon microarrays and performing homology searches with long sequence queries. Our goal is to design efficient algorithms with linear or near-linear time and space in the normal range of parameter values for these problems. For this purpose, we first discuss the solution to a basic online interval maximum problem via a sliding window approach and show how to use this solution in a non-trivial manner for many of the tiling problems introduced. We also discuss NP-hardness results and approximation algorithms for generalizing our basic tiling problem to higher dimensions. Finally, computational results from applying our tiling algorithms to genomic sequences of five model eukaryotes are reported.

Paper Available at: ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/2003/2003-05.ps.gz
DIMACS Home Page