DIMACS Seminar on Math and CS in Biology


Detecting sequence homology with $k$-tuple matches: Exploiting a probabilistic model of mutational divergence


Gary Benson
Mt. Sinai School of Medicine


Seminar Room 431, CoRE Building
Rutgers University.


2:00 p.m.
Monday, November 25, 1996 - NOTE SPECIAL DAY

The problem of detecting sequence homology is central to modern biology. When a new biological sequence is discovered, one of the first tasks is to screen it against a database like Genbank in order to find similar sequences. Two popular programs for this purpose are FASTA and BLAST. FASTA uses the sum of matches detected in matching $k$-tuples as the measure of similarity. This method works well for DNA sequences because the alphabet $\Sigma=\{A,C,G,T\}$ is small. Matching $k$-tuples are of course common, and the question naturally arises, "How many matches are required for statistical significance?"

In another application, $k$-tuple matches have been used in a new algorithm for the detection of tandem repeats. Here, we assume that two adjacent copies of a pattern have experienced some mutation and we ask, "How many matches do we expect to see?"

The answers to both these questions hinge on the underlying probabilistic models of the sequences and obtaining the answers require different approaches. In this talk, I will explore a mixture of algorithmic and probabilistic ideas concerning $k$-tuple matches and sequence homology with an emphasis on the detection of tandem repeats.

Document last modified on November 19, 1996