## DIMACS TR: 97-25

## Linear Hashing

### Authors: Noga Alon, Martin Dietzfelbinger, Peter Bro Miltersen,
Erez Petrank and Gábor Tardos

**
ABSTRACT
**

Consider the set ${\cal H}$ of all linear (or affine) transf
ormations
between two vector spaces over a finite field $F$. We study
how good $\cal H$
is as a class of hash functions, namely we consider hashing
a set $S$ of size $n$ into a range having the same cardinali
ty $n$ by a
randomly chosen function from ${\cal H}$ and look
at the expected size of the largest hash bucket.
$\cal H$ is a universal class of hash functions for any fini
te field, but
with respect to our measure different fields behave differen
tly.

If the finite field $F$ has $n$ elements then there is a bad set
$S\subset F^2$ of size $n$ with expected maximal bucket size
$\Omega(n^{1/3})$. If $n$ is a perfect square then there is even a bad
set with largest bucket size {\em always} at least $\sqrt n$. (This is
worst possible, since with respect to a universal class of hash functions
every set of size $n$ has expected largest bucket size below $\sqrt n+1/2$.)

If, however, we consider the field of two elements then we get much better
bounds. The best previously known upper bound on the expected size of
the largest bucket for this class was
$O( 2^{\sqrt{\log n}})$. We reduce this upper bound
to $O(\log n\log\log n)$. Note that this is not far
from the guarantee for a random function. There,
the average largest bucket would be $\Theta(\log n/\log \log n)$.

In the course of our proof we develop a tool which may be of
independent interest.
Suppose we have a subset $S$ of a vector space $D$ over ${\bf Z}_2$, and
consider a random linear mapping of $D$ to a smaller vector
space $R$. If the cardinality
of $S$ is larger than $c_\e|R|\log|R|$
then with probability $1-\e$,
the image of $S$ will cover
all elements in the range.

Paper Available at:
ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/1997/97-25.ps.gz

DIMACS Home Page