Many tasks in machine learning, statistics, scientific computing, and optimization ultimately boil down to numerical linear algebra. Randomized numerical linear algebra (RandNLA) exploits randomness to improve matrix algorithms for fundamental problems like matrix multiplication and least-squares using techniques such as random sampling and random projection. RandNLA has received a great deal of interdisciplinary interest in recent years, with contributions coming from numerical linear algebra, theoretical computer science, scientific computing, statistics, optimization, data analysis, and machine learning, as well as application areas such as genetics, physics, astronomy, and internet modeling. RandNLA is of great interest from a theoretical perspective, but it has the potential to be a transformative new tool for machine learning, statistics, and data analysis. The workshop aims to:
(1) Present a tutorial overview of RandNLA. A tutorial will provide background on relevant linear algebra, probability theory, numerical methods, and theory of algorithms; illustrate compelling data analysis and machine learning motivations and applications; survey implementations in different computational environments; and summarize basic results from RandNLA. The tutorial will emphasize connections and relationships between different approaches as well as recent foundational links with areas or statistics and optimization.
(2) Present connections between RandNLA and TCS. The workshop will highlight worst-case theoretical aspects of matrix randomized algorithms, including models of data access, pass efficiency, lower bounds, and connections to other algorithms for large-scale machine learning and data analysis, input-sparsity time embeddings, and geometric data analysis methods.
(3) Elucidate the interplay between RandNLA, sketching, data streams, and communication-constrained implementations. Besides input-sparsity time algorithms and terabyte-scale algorithms, a number of algorithms in RandNLA draw inspiration from techniques in the data stream literature, particularly those based on oblivious sketching. For instance, Cauchy embeddings and subsampling data structuresoriginally studied in the context of estimating norms in a data streamnow give the fastest known algorithms for robust regression. TensorSketch, a variant of the CountSketch data structure for finding heavy hitters in a stream, has machine learning applications such as kernel classification and the tensor power method.
(4) Present connections between RandNLA and more traditional approaches to problems in applied mathematics, statistics, and optimization. The workshop will emphasize connections with (convex) optimization, but also consider signal processing, sparsity-based algorithms, and matrix reconstruction. Recent developments in RandNLA with connections to statistics and optimization include both using RandNLA techniques to solve traditional statistics and optimization problems, e.g., ridge regression, Newton methods, etc., as well as characterizing implicit statistics and optimization perspectives on existing RandNLA algorithms.
This workshop aims to build on ideas and collaborations developed during the 2018 Simons Institute program on Foundations of Data Science.