Center for Communications Research (CCR), Princeton, NJ

**Organizers:****Bob Grossman**,chair, University of Illinois and Two Cultures Group, grossman@uic.edu**Paul Kantor**, Rutgers University, kantorp@cs.rutgers.edu**Muthu Muthukrishnan**, AT&T Labs - Research and Rutgers University, muthu@cs.rutgers.edu- Workshop Coordinator:

- Dolores Koch, DJK@idaccr.org

Moses S. Charikar, Princeton UniversityTitle: Algorithmic Techniques for Clustering in the Streaming Data Model

Graham Cormode, University of WarwickTitle: Algorithmic Embedding for Comparing Large Text Streams

Semantic Information Processing of Spoken Language in Dialog

Johannes Gehrke, Cornell UniversityTitle: Distributed Mining and Monitoring

Sudipto Guha, University of PennsylvaniaTitle: Learning Mixture of Markov Chains

Geoff Hulten, University of WashingtonTitle: A General Framework for Mining Very Large Databases and Data Streams

Marco Mazzucco, University of Illinois at ChicagoTitle: Merging of High Bandwidth Data Streams

Rafail Ostrovsky, Telcordia TechnologiesTitle: Content-Sensitive Fingerprinting and its Applications

H. Vincent Poor, Princeton UniversityTitle: Change Detection: A Tutorial Overview

Many problems in data mining involve detecting the sudden onset of anomolous behavior in time series data. After a brief discussion of several motivating applications in this area, the fundamentals underlying algorithms for this purpose will be reviewed.

Title: Graph Mining: Discovery in Large Networks

Rebecca Wright, DIMACSTitle: Protecting Privacy in Data-mining Applications

Data mining is concerned with revealing information contained in some data. However, in some data mining applications, it is also desirable to protect information other than the computed output at the same time. Doing so can provide protection of personal information, protection of sensitive information, and foster collaboration between different agencies. For example, two different governmental agencies may be more willing to work together to compute outputs of their joint data if they need not reveal that data to each other in order to do so. In this talk, I will discuss some recent work in two areas: secure multiparty computation of approximations and privacy-preserving statistics computations.

Secure multiparty approximations. Approximations are often useful to reduce computation and communication costs in a distributed setting where the inputs are held by different parties and are extremely large. Secure multiparty computation of approximations addresses applications in which the parties want to cooperate to compute a function of their inputs without revealing more information than they have to. Secure multiparty computation is a well-studied tool for computing exact functions without revealing unnecessary information, but these definitions can not be applied directly to approximation functions without a potential loss of privacy. We extend these definitions to apply to secure multiparty approximate computations, and we present an efficient, sublinear-communication, private approximate computation for the Hamming distance; we also give an efficient, polylogarithmic-communication solution for the L2 distance in a relaxed model.

Privacy-preserving statistics: Suppose a client wishes to compute some aggregate statistics on a privately-owned data base. The data owner wants to protect the privacy of the personal information in the data base, while the client does not want to reveal his selection criteria. Privacy-protecting statistical analysis allows the client and data owner to interact in such a way that the client learns the desired aggregate statistics, but does not learn anything further about the data; the data owner leans nothing about the client's query. Motivated by this application, we consider the more general problem of "selective private function evaluation," in which a client can privately compute an arbitrary function over a database. We present various approaches for constructing efficient selective private function evaluation protocols, both for the general problem and for privacy-protecting statistical analysis.

(This talk includes joint work with Ran Canetti, Joan Feigenbaum, Yuval Ishai, Ravi Kumar, Tal Malkin, Kobbi Nissim, Michael Reiter, Ronitt Rubinfeld, and Martin Strauss.)