# DIMACS Seminar on Math and CS in Biology

## Title:

Detecting sequence homology with $k$-tuple matches: Exploiting a
probabilistic model of mutational divergence

## Speaker:

- Gary Benson
- Mt. Sinai School of Medicine

## Place:

- Seminar Room 431, CoRE Building
- Rutgers University.

## Time:

- 2:00 p.m.
- Monday, November 25, 1996 -
**NOTE SPECIAL DAY**

Abstract:
The problem of detecting sequence homology is central to modern
biology. When a new biological sequence is discovered, one of the
first tasks is to screen it against a database like Genbank in order
to find similar sequences. Two popular programs for this purpose are
FASTA and BLAST. FASTA uses the sum of matches detected in matching
$k$-tuples as the measure of similarity. This method works well for
DNA sequences because the alphabet $\Sigma=\{A,C,G,T\}$ is
small. Matching $k$-tuples are of course common, and the question
naturally arises, "How many matches are required for statistical
significance?"

In another application, $k$-tuple matches have been used in a new
algorithm for the detection of tandem repeats. Here, we assume that
two adjacent copies of a pattern have experienced some mutation and we
ask, "How many matches do we expect to see?"

The answers to both these questions hinge on the underlying
probabilistic models of the sequences and obtaining the answers
require different approaches. In this talk, I will explore a mixture
of algorithmic and probabilistic ideas concerning $k$-tuple matches
and sequence homology with an emphasis on the detection of tandem
repeats.

Document last modified on November 19, 1996