Latent Simplex Learning in Input-sparsity-efficient Time

September 17, 2019, 2:00 PM - 2:40 PM


Center Hall

Rutgers University

Busch Campus Student Center

604 Bartholomew Rd

Piscataway NJ

Click here for map.

Ravi Kannan, Microsoft Research

We formulate a geometric problem called Latent Simplex Learning (LSL) which includes as special cases the core problem in several Unsupervised Learning settings such as Topic Modeling and Mixed Community Block Membership models. LSL is simply stated:find a latent k-simplex K in Rd given n data points, each obtained by perturbing a latent point in K. The perturbations are large and most data points lie (far) outside K.
Under natural assumptions, which we show hold in the applications, we devise a algorithm to solve LSL which runs in time O*(k nnz) matching the best running time of algorithms in special cases. We use a new technique we call "subset smoothing" which optimizes k carefully chosen linear functions over the convex hull of (n  δn) points, obtained by averaging all δn subsets of the data points. We prove that each optimization yields an approximation to a new vertex of K. The proof is non-trivial and uses existing and new tools from Numerical Analysis, especially angles between singular spaces of close-by matrices.
Joint work with C. Bhattacharyya.