DIMACS/PORTIA Workshop on Privacy-Preserving Data Mining
March 15 - 16, 2004
DIMACS Center, CoRE Building, Rutgers University, Piscataway, NJ
Presented under the auspices of the
Special Focus on Communication Security and Information Privacy, and the PORTIA project.
- Cynthia Dwork, Microsoft, dwork at microsoft.com
- Benny Pinkas, HP Labs, benny.pinkas at hp.com
- Rebecca Wright, Stevens Institute of Technology, rwright at cs.stevens-tech.edu
The workshop is followed by a related working group on March 17, 2004.
This workshop and working group will bring together researchers and
practitioners in cryptography, data mining, and other areas to discuss
privacy-preserving data mining. The workshop sessions on March 15 and
16, 2004 will consist of invited talks and discussion. March 17, 2004 will be a
"working group" of invited participants to identify and explore
approaches that could serve as the basis for more sophisticated
algorithms and implementations than presently exist, and to discuss
directions for further research and collaboration.
Both the workshop and working group will investigate the construction and exploitation of "private" databases, e.g.
- Merging information from multiple data sets in a consistent,
secure, efficient and privacy-preserving manner;
- Sanitizing databases to permit privacy-preserving public study.
In a wide variety of applications it would be useful to be able to gather information from several different data sets. The owners of these data sets may not be willing, or legally able, to share their complete data with each other. The ability to collaborate without revealing information could be instrumental in fostering inter-agency collaboration.
Particular topics of interest include:
- Secure multi-party computation. This is a very general and
well-studied paradigm that unfortunately has not been used in practice so far. We will investigate ways to make it more efficient and encourage its deployment.
- Statistical techniques such as data swapping,
post-randomization, and perturbation.
- Articulation of different notions and aspects of privacy.
- Tradeoffs between privacy and accuracy.
- Architectures that facilitate private queries by a
(semi-trusted) third party.
- Methods for handling different or incompatible formats, and
erroneous data. We will investigate ideas from dimension reduction, clustering and searching strategy.
- Additional issues such as insuring the accuracy and reliability
of responses, query authentication, logging, auditing, access control and authorization policies.
Next: Call for Participation
Contacting the Center
Document last modified on December 17, 2003.