The unprecedented scale of these data sets demands that we revisit traditonal notions of efficient algorithms. In many applications, this means that the computational resources used be small relative to the size of the data. Because of the severity of the constraints, one might think that there is only hope for solving problems that can be solved by straightforward statistical sampling techniques.
Surprisingly, researchers in several different areas of mathematics and computer science have recently shown that interesting results can be obtained under such sublinear constraints. Besides techniques from sampling, many of these results incorporate sophisticated algorithmic and combinatorial contributions to solve appropriate approximate variations of the problems. To study this emerging area, we propose to bring together researchers working in several communities, including combinatorial property testing, streaming algorithms, randomized algorithms and learning theory. Many of the researchers in these different areas have had experience with massive data sets that occur in diverse domains such as the web and databases; by bringing these people together, we hope to advance participants' research by establishing common issues.
Limited support might be available, please contact either Ronitt Rubinfeld, ronitt@research.nj.nec.com or Martin Strauss, mstrauss@research.att.com.