97-65: Longest Common Consecutive Substring in Two Random Strings

DIMACS TR: 97-65

Longest Common Consecutive Substring in Two Random Strings

Author: Sarmad Abbasi

ABSTRACT

Let $\Sigma$ be a finite alphabet with $C$ letters. For any two strings $x$ and $y$ of length $n$, we let $S(x,y)$ denote the size of the longest common consecutive substring between $x$ and $y$; that is, $S(x,y)$ is the largest $k$ such that, $$ x_i \cdots x_{i+k} = y_j \cdots y_{j+k}$$ for some $i$ and $j$. We show that for $x$ and $y$ chosen uniformly among all possible strings of length $n$, $S(x,y)$ is highly concentrated around $2 \log_C n$. More precisely, for any $a \geq 1$ $$ \Pr [ |S(x,y) - 2 \log_C n | > a] \leq {18 C^{-a} \over n^2}.$$ This enables us to show that the expected value of $S(x,y)$ is with a constant additive factor of $2 \log n$.

Paper Available at: ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/1997/97-65.ps.gz

DIMACS Home Page