DIMACS :: Details

Releasing a Differentially Private Password Frequency Corpus from 70 Million Yahoo! Passwords

October 23, 2017, 5:00 PM - 5:30 PM

Location:

DIMACS Center

Rutgers University

CoRE Building

96 Frelinghuysen Road

Piscataway, NJ 08854

Click here for map.

Jeremiah Blocki, Purdue University

Abstract

This talk will describe the design and implementation of a differentially private mechanism for releasing perturbed integer partitions. This mechanism was used to release a password frequency corpus of 70 Million Yahoo! passwords. Given a dataset of user-chosen passwords, the frequency list reveals the frequency of each unique password. The talk will especially focus on some of the design decisions we encountered when deploying our differentially private mechanism.

Our mechanism is based on a novel algorithm for sampling that enables an efficient implementation of the exponential mechanism for differential privacy (naive sampling is exponential time). While we proved that our mechanism introduces L1 error at most $n^{1/2}/epsilon$, the best lower bounds on L1 error are $n^{1/2}/epsilon^{1/2}$. We will present strong empirical evidence for the conjecture that the L1 error actually scales with $n^{1/2}/epsilon^{1/2}$. These empirical results suggest that the algorithm might be usefully applied to reduce the L1 error of differentially private algorithms for releasing degree distributions in a social network with node-privacy.

Slides Video