DIMACS Series in
Discrete Mathematics and Theoretical Computer Science

VOLUME Sixty Five
TITLE: "The Random Projection Method"
AUTHOR: Santosh S. Vempala

Ordering Information

This volume may be obtained from the AMS or through bookstores in your area.

To order through AMS contact the AMS Customer Services Department, P.O. Box 6248, Providence, Rhode Island 02940-6248 USA. For Visa, Mastercard, Discover, and American Express orders call 1-800-321-4AMS.

You may also visit the AMS Bookstore and order directly from there. DIMACS does not distribute or sell these books.


Foreword (by Christos H. Papadimitriou)

To project is to throw forward, what a sower does to seeds - an image of deliberate randomness. The Latin root of the word survives in Caesar's famous proclamation on the randomness inherent in life and history, and the decisiveness and skill with which it must be met: "Alea jacta est." Projection is, of course, a plunge onto a lower-dimensional domain (canvas, retina, screen, projective plane). But it need not be an uncomfortable dimensional compromise; a projection can reveal, enhance, evince, illuminate-as long as it is the right projection.

This book is about the radical idea that even a random projection is often useful. Two decades ago, analysts stumbled upon a surprising fact of this sort, the Johnson-Lindenstrauss Lemma, as a crucial tool in their project of extending functions in continuous ways. This result (several variants of which are proved in this book, the first in page 2) says that, if you project n points in some highdimensional space down to a random O(log n)-dimensional plane, the chances are overwhelming that all distances will be preserved-within a small relative error. So, if distance is all you care about, there is no reason to stay in high dimensions!

Over the past ten years, random projection (and its extensions and variants) has been used increasingly often in Computer Science. For example (to mention one application of the technique, due to Luca Trevisan, that is not considered here), the traveling salesman problem in log n dimensions is as hard to approximate as the same problem in any higher dimensional space - that is to say, very hard.

Random projection and high-dimensional geometry lie at the heart of several important approximation algorithms for problems of a nongeometric nature. For example, the best known algorithm for approximating the maximum cut of a graph works by first embedding the graph in a high-dimensional space (the "best" such embedding can be found by semidefinite programming). A random hyperplane through the origin will then cut a near-maximum number of edges (cutting by a random hyperplane is, of course, tantamount to projecting on a random line, in a sense the ultimate random projection). There are several other important graph theoretic problems, including multicommodity flow, that are best approximated through a geometric consideration (though not always by random projection), and these constitute the subject of the first part of the book.

The second part of the book is about learning - and its geometry. How is it, asks Vempala, that we can learn complex concepts by focusing on the right aspects of the world, ignoring thousands of irrelevant attributes? He speculates that the key to the mystery lies in geometry. What is shown here is that, if concepts are geometrically well separated, or if the natural ways to separate them use criteria (hyperplanes) that are highly interdependent algebraically, then certain natural algorithms converge quickly to the right answer. (It would be very intriguing to verify that the genres of concepts in the learning of which we excel have indeed such positive properties.) These arguments involve, among other interesting geometric and algebraic techniques, a simpler variant of random projection that is expressly plausible in the context of learning and networks of neurons.

The last part deals with what the author calls "information retrieval": The nearest neighbor problem, for which random projection works so well, and his own geometric take on clustering and principal components analysis (using yet another variant of random projection that is best adapted to the current application).

This is an elegant monograph, dense in ideas and techniques, diverse in its applications, and above all vibrant with the author's enthusiasm for the area. I believe that randomized projection and related geometric techniques will continue to have more and more applications in Computer Science - and that this book, by making the area accessible to graduate students and researchers (sowing the seeds, so to speak), will contribute importantly in this regard.


Forward                                             vii
Acknowledgements                                     ix

Chapter 1. Random Projection                          1
  1.1. How to project                                 1
  1.2. Basic properties of random projection          2
  1.3. When to project                                4

Part 1. Combinatorial Optimization                    5

Chapter 2. Rounding via Random Projection             7
  2.1. Maximum Cut                                    7
  2.2. Maximum k-cut                                  9
  2.3. Coloring                                      11

Chapter 3. Embedding Metrics in Euclidean Space      15
  3.1. Bourgain's embedding                          16
  3.2. Finding a minimum distortion embedding        19
  3.3. The max-flow min-cut gap for multicommodity
         flow                                        20

Chapter 4. Euclidean Embeddingd: Beyond Distance
             Preservation                            27
  4.1. Metric volume and geometric volume            27
  4.2. Preserving point-subset distance              34
  4.3. Ordering and layout problems                  40
  4.4. The Embed-and-Project rounding algorithm      43

Part 2. Learning Theory                              49

Chapter 5. Robust Concepts                           51
  5.1. A model of cognitive learning                 52
  5.2. Neuron-friendly random projection             52
  5.3. Robust half-spaces                            57

Chapter 6. Intersections of Half-Spaces              61
  6.1. The complexity of an old problem              61
  6.2. A randomized algorithm                        63
  6.3. Its analysis                                  67

Part 3. Information Retrieval                        75

Chapter 7. Nearest Neighbors                         77
  7.1. Euclidean space                               77
  7.2. Random projection for the hypercube           79
  7.3. Back to Euclidean space                       83

Chapter 8. Indexing and Clustering                   87
  8.1. Fast low-rank aproximation                    87
  8.2. Geometric p-median                            92

Bibliography                                         97

Appendix                                            101

Index Index of Volumes
DIMACS Homepage
Contacting the Center
Document last modified on October 7, 2004.