Performance of algorithms using hashing is often related to the largest number of keys mapped by the hash function to the same value. This motivates the study of the expected size of the largest bucket for the worst case set of keys of a given size. In this work, we study the performance of linear transformations as a family of hash functions by this measure.
Our main result is the linear mappings over GF are excellent hash functions. They almost match the performance of the best possbile family of hash functions. (But, of course, are much more efficient.)
A surprising additional result we have implies that linear mappings over big finite fields (say modulo a prime) are the other way around. They are almost as bad as possible for any universal family of hash functions.
The main tool in proving the main result may be of independent interest. We show that a random linear mapping applied on a set of cardinality $c n\log n$ covers all the element of a size $n$ range with probability at least $1-\epsilon_c$.
This is joint work with Noga Alon, Martin Ditzfelbinger, Peter Bro Miltersen, and Erez Petrank.