DIMACS TR: 99-34
Low Discrepancy Sets Yield Approximate Min-Wise Independent
Permutation Families
Authors: Michael Saks, Aravind Srinivasan, Shiyu Zhou and David Zuckerman
ABSTRACT
Motivated by a problem of filtering near-duplicate
Web documents, Broder, Charikar, Frieze \& Mitzenmacher
defined the following notion of {\em $\epsilon$-approximate
min-wise independent permutation families}. A
multiset ${\cal F}$ of permutations of $\{0,1,\ldots,n-1\}$
is such a family if for all $K \subseteq \{0,1,\ldots,n-1\}$
and any $x \in K$, a permutation $\pi$ chosen uniformly at
random from ${\cal F}$ satisfies
$$|\ \Pr[\min\{\pi(K)\}=\pi(x)]-{1\over |K|}\ |\le{\eps\over |K|}.$$
We show connections of such families with
{\em low discrepancy sets for geometric rectangles},
and give explicit constructions of such families ${\cal F}$
of size $n^{O(\sqrt{\log n})}$ for $\eps=1/n^{\Theta(1)}$,
improving upon
the previously best-known bound of Indyk.
We also present polynomial-size constructions
when the min-wise condition is required only for
$|K| \le 2^{O(\log^{2/3} n)}$, with $\epsilon \geq 2^{-O(\log^{2/3} n)}$.
Paper Available at:
ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/1999/99-34.ps.gz
DIMACS Home Page