« search calendars« DIMACS Workshop on Randomized Numerical Linear Algebra, Statistics, and Optimization

« Statistical Estimations from Locality Sensitive Hashing (LSH): Adaptive Sampling at the Cost of Random Sampling

Statistical Estimations from Locality Sensitive Hashing (LSH): Adaptive Sampling at the Cost of Random Sampling

September 18, 2019, 2:40 PM - 3:20 PM


Center Hall

Rutgers University

Busch Campus Student Center

604 Bartholomew Rd

Piscataway NJ

Click here for map.

Anshumali Shrivastava, Rice University

Sampling is one of the fundamental hammers in machine learning (ML) for reducing the size and scale of the problem at hand. Many ML applications demand adaptive sampling for faster convergence. However, the cost of adaptive sampling itself is prohibitive when the sampling weights are changing. This creates a fundamental barrier. 

In this talk, I will discuss some of my recent and surprising findings on the use of hashing algorithms for large-scale estimations. Locality Sensitive Hashing (LSH) is a hugely popular algorithm for sub-linear near neighbor search. However, it turns out that fundamentally LSH is a constant time (amortized) adaptive sampler from which efficient near-neighbor search is one of the many possibilities. LSH offers a unique capability to do smart sampling and statistical estimations at the cost of few hash lookups. Our observation bridges data structures (probabilistic hash tables) with efficient unbiased statistical estimations. I will demonstrate how this dynamic and efficient sampling beak the computational barriers in adaptive estimations, where it is possible that we pay roughly the cost of uniform sampling but get the benefits of adaptive sampling. I will demonstrate the power of a straightforward idea for a variety of problems 1) Adaptive Gradient Estimations for efficient SGD, 2) Efficient Deep Learning, 3) Anomaly Detection, and 4) The first possibility of sub-linear sketches for near-neighbor queries.