Friend Center 006

Princeton University, Princeton NJ

**Organizers:****Sanjeev Arora**, Princeton University, arora at cs.princeton.edu**Moses Charikar**, Princeton University, moses at cs.princeton.edu**Bill Johnson**, Texas A&M, johnson at math.tamu.edu**Assaf Naor**, NYU, naor at cims.nyu.edu

Title: Hardness of Nearest Neighbor under L_infinity

Recent years have seen a significant increase in our understanding of high-dimensional nearest neighbor search (NNS) for distances like the L_1 and L_2 norms. By contrast, our understanding of the L_infinity norm is now where it was 10 years ago.

In FOCS'98, Indyk proved the following unorthodox result: there is a data structure (in fact, a decision tree) of polynomial size, which achieves O(log log d) approximation for NNS in the d-dimensional L_infinity metric. More generally, the data structure has space O(n^r), for any r>1, for approximation O(log_r log d).

Our lower bound indicates that Indyk's unconventional bound might in fact be optimal. Specifically, we show a lower bound for the asymmetric communication complexity of NNS under L_infinity, which proves that this space/approximation trade-off is optimal for decision trees and for data structures with constant cell-probe complexity.

In the talk, I will start out with an intuitive "information" view of Indyk's result, which will also help us understand where the lower bound comes from.

This is joint work with Dorian Croitoru (MIT) and Mihai Patrascu (MIT).

Title: Open problems in learning low dimensional representations

In the past decade, one of the most exciting and fruitful research directions in machine learning has been finding ways to exploit "intrinsic low dimensionality" of data that are superficially high dimensional. Numerous models and algorithms have been proposed and empirically shown to hold promise, but the underlying theory remains largely underdeveloped.

There are several broad types of "intrinsic low dimensionality" that repeatedly arise in learning scenarios. I will give examples of these and explain why they are not adequately captured by existing popular dimensionality notions. I'll then describe core algorithmic/statistical tasks on such data, along with some of failings of the heuristics by which they are currently handled.

Explicit construction of a small epsilon-net for linear threshold functions

Title: Learning Convex Bodies is Hard.

The problem we are really interested in is whether the volume of a convex body in dim n be determined from polynomially many independent uniformly random points from the body. [Note that we do not have the power of membership queries, which is used crucially in the well-known volume estimation algorithms.] We are only concerned with the number of samples, and not the complexity issue of computing the volume from these samples. We show that it is not possible to learn convex bodies in this model using polynomially many random samples. This shows that one cannot compute the volume by first learning the body. Our construction of hard distribution on convex bodies is quite simple, and may be of independent interest.

Joint work with Luis Rademacher.

Title: Expander codes over reals and explicit Euclidean sections

Classical results from the 1970's state that a random subspace of R^N of dimension N/2 has "distortion" O(1) w.h.p where the distortion is measured by sqrt{N} times the largest ratio of the L_2 to L_1 norms of a vector in the subspace. Similar randomized geometric constructions arise in areas like dimensionality reduction and compressed sensing. Explicit constructions of such objects, however, seem very hard to come by.

Low distortion subspaces are of interest as they lead to good compressed sensing matrices and error-correcting codes over reals. The talk will describe constructions inspired by expander codes that can be used to produce subspaces with low distortion either explicitly or with sublinear randomness. The talk will also touch upon the connection to compressed sensing, and a near-linear time iterative signal recovery algorithm for our expander based construction.

Based on joint works with James Lee, Alexander Razborov, and Avi Wigderson

Title: Graph norms and Sidorenko's conjecture.

I will prove some results in the direction of answering a question of Lovasz about the norms defined by certain combinatorial structures: Inspired by the similarity of the definitions of $L_p$, Schatten, and Gowers norms, we introduce and study a wide class of norms containing these, as well as many other norms. It will be proven that every norm in this class must satisfy a Holder type inequality. Then using this condition we will verify the Hanner inequality with certain parameters for this class of norms. This will give a unified proof for the result of Hanner and a result of Tomczak-Jaegermann, and more importantly generalizes them to a wider class of norms which includes Gowers norms. In a different direction we will show an application of the above mentioned Holder type inequality to a conjecture of Sidorenko about subgraph densities. I will discuss many related open problems.

Title: Can cubic tiles be sphere-like?

The unit cube tiles R^d by Z^d, in the sense that its translations by vectors from Z^d cover R^d. It is natural to ask what is the minimal surface area of a body that tiles R^d by Z^d. The volume of any such body should clearly be at least 1, and therefore its surface area must be at least that of a unit volume ball, which of order sqrt(d). The surface area of the cube, however, is of order d, and no better tiling was known. In this work we use a random construction to show that the optimal surface area is indeed of order sqrt(d), namely there exist bodies that tile R^d as a cube would, but have sphere-like surface areas.

Tiling problems were considered for well over a century, but this particular tiling problem was also recently considered in computer science because of its relations with the Unique Games conjecture and with the Parallel Repetition problem. Indeed, our current result follows from the same idea that was used recently by Raz in his counter example for the strong Parallel Repetition conjecture.

Joint work with Ryan O'Donnell, Anup Rao, and Avi Wigderson.

Title: Metric Embeddings As Computational Primitives

Metric embedding has become an indispensable algorithmic tool, and in fact may be viewed as a computational primitive. This talk is intended to explore the boundaries of this approach, by considering richer host spaces or extending the computational model. I will survey recent progress in these directions, geared towards describing open questions.

[Based on joint work with Alex Andoni.]

Title: On the geometry of graphs with forbidden minors

I will give a structure theorem for the geometries induced on families of graphs that forbid a fixed minor, and discuss implications for the L_1 embedding/multi-flow conjecture.

Title: Online embedding of metrics

In the online setting, one has to produce an embedding of an input metric that is exposed bit by bit in the fixed host space. No changes to previous decisions are allowed. The objective is to minimize the multiplicative distortion.

To the best of our knowledge, no systematic study of online metric embeddings were studied so far. We initiate such a study, and show several positive and negative results in this direction.

This is a joint work with Yuri Rabinovich.

**Partha Niyogi**

Title: Manifold Learning: A geometric perspective on learning theory and algorithms

We consider the following learning-theoretic question. Suppose you are given a collection of data points (a point cloud) sampled from a Riemannian submanifold of Euclidean space. What geometric or topological properties of this manifold can you provably learn (estimate) from such randomly sampled points? We show how to learn the laplace operator, the heat kernel, the harmonic forms, and the homology from random samples. The analysis of the algorithmic framework requires us to relate the geometry of a suitable data-derived random graph (or complex) to the geometry of the underlying manifold. We will consider the implications of this point of view for machine learning, statistics, and numerical analysis.

Title: Explicit construction of a small epsilon-net for linear threshold functions

We give explicit constructions of epsilon nets for linear threshold functions on the binary cube and on the unit sphere. The size of the constructed nets is polynomial in the dimension and 1/epsilon. To the best of our knowledge no such constructions were previously known. Our results match, up to the exponent of the polynomial, the bounds that are achieved by probabilistic arguments.

As a corollary we also construct subsets of the binary cube that have size polynomial in the dimension and covering radius of $n/2 - c\sqrt{n\log n}$, for any constant c. This improves upon the well known construction of dual BCH codes that only guarantee covering radius of $\n/2 - c\sqrt{n}$.

Our work is motivated by questions of constructing interesting geometric objects, known to exist via probabilistic arguments. We will discuss some of this motivation. This is joint work with Amir Shpilka.

Title: Expanders via Random Spanning Trees

Motivated by the problem of routing reliably and scalably in a graph, we introduce the notion of a {\em splicer}, the union of spanning trees of a graph. We prove that for any bounded-degree $n$-vertex graph, the union of two {\em random} spanning trees approximates the expansion of every cut of the graph to within a factor of $O(\log n)$. For the random graph $G_{n,p}$, for $p> c\log{n}/n$, two spanning trees give an expander. This is suggested by the case of the complete graph, where we prove that two random spanning trees give an expander. The construction of the splicer is elementary ? each spanning tree can be produced independently using an algorithm by Aldous and Broder: a random walk in the graph with edges leading to previously unvisited vertices included in the tree.

A second important application of splicers is to graph sparsification where the goal is to approximate every cut (and more generally the quadratic form of the Laplacian) using only a small subgraph of the original graph. Benczur-Karger \cite{BK} as well as Spielman-Srivastava \cite{SS} have shown sparsifiers with $O(n \log n/\eps^2)$ edges that achieve approximation within factors $1+\eps$ and $1-\eps$. Their methods, based on independent sampling of edges, need $\Omega(n\log n)$ edges to get any approximation (else the subgraph could be disconnected) and leave open the question of linear-size sparsifiers. Splicers address this question for random graphs by providing sparsifiers of size $O(n)$ that approximate every cut to within a factor of $O(\log n)$.

This is joint work with Navin Goyal and Santosh Vempala.

Title: Exact Low-rank Matrix Completion via Convex Optimization

How can we infer answers from a survey that is only partially filled out? Suppose we ask a large collection of individuals a series of questions. We collect some data but, unfortunately, many questions are left unanswered. Is it possible for us to make an educated guess about what the missing answers should be? How can we make such a guess? In general, with no additional assumptions, this is impossible. However, if we assume that we can arrange all of the answers into a low rank matrix, there is often a unique assignment for the missing entries.

In this talk, I will discuss very general settings in which one can perfectly recover all of the missing entries of a low-rank matrix from a sufficiently large random subset of entries by solving a convex programming problem. Out of all of the matrices whose entries agree with the given subset, this program finds the matrix with the minimum nuclear norm (also known as the Ky-Fan norm or the Schatten 1-norm). I will show that if the number of sampled entries is greater than C n^{1.25} r log n for some positive numerical constant C, then with very high probability, most n x n matrices of rank r are perfectly recovered by the nuclear norm heuristic. The techniques for proving the main results build upon geometric ideas from the the recent literature on compressed sensing together with probabilistic tools such as the powerful techniques of Bourgain and of Rudelson for bounding norms of operators between Banach spaces. These results illustrate a general program for perfectly reconstructing geometric objects from very limited information.

Title: Column subset selection, matrix factorization, and eigenvalue optimization

Given a fixed matrix, the problem of column subset selection requests a column submatrix that has favorable spectral properties. Most research from the algorithms and numerical linear algebra communities focuses on a variant called rank-revealing {\sf QR}, which seeks a well-conditioned collection of columns that spans the (numerical) range of the matrix. The functional analysis literature contains another strand of work on column selection whose algorithmic implications have not been explored. In particular, a celebrated result of Bourgain and Tzafriri demonstrates that each matrix with normalized columns contains a large column submatrix that is exceptionally well conditioned. Unfortunately, standard proofs of this result cannot be regarded as algorithmic.

This paper presents a randomized, polynomial-time algorithm that produces the submatrix promised by Bourgain and Tzafriri. The method involves random sampling of columns, followed by a matrix factorization that exposes the well-conditioned subset of columns. This factorization, which is due to Grothendieck, is regarded as a central tool in modern functional analysis. The primary novelty in this work is an algorithm, based on eigenvalue minimization, for constructing the Grothendieck factorization. These ideas also result in an approximation algorithm for the $(\infty, 1)$ norm of a matrix, which is generally {\sf NP}-hard to compute exactly. As an added bonus, this work reveals a surprising connection between matrix factorization and the famous {\sc maxcut} semidefinite program.

Title: Isotropic Principal Components

We present an extension of standard Principal Component Analysis (PCA) that appears to uncover interesting directions when PCA does not. We apply this to the problem of unraveling a mixture of arbitrary Gaussians, and give an algorithm that succeeds well beyond the known threshold for inter-Gaussian separation. We mention other potential applications.

Previous: Program

Workshop Index

DIMACS Homepage

Contacting the Center

Document last modified on October 29, 2008.