## DIMACS TR: 97-65

## Longest Common Consecutive Substring in Two Random Strings

### Author: Sarmad Abbasi

**
ABSTRACT
**

Let $\Sigma$ be a finite alphabet with $C$ letters. For any two strings
$x$ and $y$ of length $n$, we let $S(x,y)$ denote the size of the longest
common consecutive substring between $x$ and $y$; that is, $S(x,y)$
is the largest $k$ such that,
$$ x_i \cdots x_{i+k} = y_j \cdots y_{j+k}$$
for some $i$ and $j$. We show that for $x$ and $y$ chosen
uniformly among all possible strings of length $n$, $S(x,y)$ is highly
concentrated around $2 \log_C n$. More precisely, for any $a \geq 1$
$$ \Pr [ |S(x,y) - 2 \log_C n | > a] \leq {18 C^{-a} \over n^2}.$$
This enables us to show that the expected value of $S(x,y)$ is with
a constant additive factor of $2 \log n$.

Paper Available at:
ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/1997/97-65.ps.gz

DIMACS Home Page