DIMACS/Dept. of Computer Science Colloquium Talk

Title: Privacy-preserving data mining in the fully distributed model

Speaker: Rebecca Wright, Stevens Institute of Technology

Date: Friday, February 3, 2006 11:00am - 12:15pm

Location: DIMACS Center, CoRE Bldg, Room 431, Rutgers University, Busch Campus, Piscataway, NJ


Privacy-preserving data mining seeks to balance the ability to perform useful computations on data held by many parties with the desire to protect sensitive information. In the fully distributed model, each of many users or devices has information which we think of as one record in a virtual database. In this talk, I will describe several privacy-preserving methods for computing on this virtual database. Specifically, I will present our results in two areas: (1) privacy-preserving frequency mining and (2) privacy-preserving k-anonymization. In privacy-preserving frequency mining, we present a way for a data miner to learn the frequencies of combinations of data values without learning the individual data values, and we discuss how these frequencies can enable various classification tasks. In privacy-preserving k-anonymization, we consider the previously proposed method of protecting identities in data through k-anonymization, which modifies data so that each individual is "hidden" among at least k others. Previous algorithms for k-anonymization of data have assumed centralized access to the entire data set. In our work, we show how a data miner or data publisher can learn a k-anonymized version of a fully distributed database without learning the entire data set.

(Joint work with Zhiqiang Yang and Sheng Zhong.)