DIMACS Theoretical Computer Science Seminar

Title: Optimal lower bounds for locality sensitive hashing (except when q is tiny)

Speaker: Ryan O'Donnell, CMU

Date: Wednesday, October 13, 2010 11:00-12:00pm

Location: DIMACS Center, CoRE Bldg, Room 431, Rutgers University, Busch Campus, Piscataway, NJ


Locality Sensitive Hashing (LSH) is a widely used algorithmic tool, combining hashing with geometry, with applications to nearest-neighbor search. For points in {0,1}^d, one wants a family H of functions such that if dist(x,y) <= r then Pr[h(x) = h(y)] >= p (when h is chosen randomly from H), whereas if dist(x,y) >= cr then Pr[h(x) = h(y)] <= q. For a fixed c, the quality of the LSH family is determined by the smallness of its "rho parameter", namely rho = ln(1/p)/ln(1/q). In their seminal 1998 work, Indyk and Motwani gave a simple LSH family with rho <= 1/c. The only known lower bound, [Motwani-Naor-Panigrahy'07], was that rho must be at least roughly 0.5/c.

In this work, we give a simple proof of the optimal lower bound, rho >= 1/c. In one sentence, it follows from the fact that the noise-stability of a boolean function at "time" t is a log-convex function of t.

We will also discuss the fact that all results mentioned here rely on the underpublicized assumption that q is not too small.

This is joint work with Yi Wu of IBM Research and Yuan Zhou of Carnegie Mellon.