Balancing Data Confidentiality and Data Quality: A two-day tutorial sponsored by DIMACS and DyDAn
November 8 - 9, 2007
DIMACS Center, CoRE Building, Rutgers University
- Organizers:
- Larry Cox, CDC, ljtcox at aol.com
Presented under the auspices of the Special Focus on Computational and Mathematical Epidemiology, the Special
Focus on Communication Security and Information Privacy and the Center for Dynamic Data Analysis (DyDAn).
Workshop Program:
This is a preliminary program.
Thursday, November 8, 2007
8:30 - 9:20 Registration and Breakfast
9:20 - 9:30 Welcome and Opening Remarks
Fred Roberts, DIMACS Director
9:30 - 10:45 What is Statistical Disclosure?
Qualitative Issues
Ethical, legal and statistical considerations
CIPSEA and HIPAA
Balancing the right to privacy with the need to know
Administrative solutions
Disclosure checklists
Small geography and domain data
Quantitative Issues
Defining statistical disclosure quantitatively
Illustrative example
10:45 - 11:00 Morning Break
11:00 - 12:30 Statistical Disclosure Limitation (SDL) for Frequency Count Data
Examining and defining the problem
Rounding and perturbation methods and their effects on data quality
Swapping and switching methods and their effects on data quality
12:30 - 1:45 Lunch
1:45 - 3:15 SDL for Aggregate Magnitude Data
Quantifying disclosure: Statistical disclosure rules
Cell bounds and disclosure audit
Complementary cell suppression
Mathematical statement of the cell suppression problem
Why cell suppression is a very difficult problem
Using mathematical networks for complementary cell suppression
Quality effects of cell suppression
Releasing interval data
3:15 - 3:30 Afternoon Break
3:30 - 5:00 SDL for Aggregate Magnitude Data (cont.)
Controlled tabular adjustment (CTA)
The CTA method
Quality-preserving controlled tabular adjustment (QP-CTA)
Minimum discrimination information controlled tabular adjustment (MDI-CTA)
Perturbing the underlying microdata
Friday, November 9, 2007
8:00 - 8:30 Continental breakfast
8:30 - 10:00 SDL in Microdata
Defining microdata disclosure
Likelihood of disclosure and risk of disclosure
Censoring. Rounding. Perturbation
Microaggregation and its effects on data quality
Blank and impute
Synthetic microdata and its effects on data quality
Contextual variables
Research data centers, remote access and remote execution
10:00 - 10:15 Morning Break
10:15 - 11:45 SDL in Microdata (continued)
Small domain data
Effectiveness of SDL methods for microdata
Disclosure risk analysis
Defining disclosure and disclosure risk
Secure multi-party regression
11:45 - 1:00 Lunch
1:00 - 2:15 SDL in Statistical Data Bases
Statistical data base query systems as multi-dimensional tables
Estimating confidential and missing data
Releasing marginal totals or log-linear models and effects on data quality
Secure distributed statistical analysis
2:15 - 2:30 Afternoon Break
2:30 - 3:00 Wrap-Up and Discussion
Brief discussion of the literature
Questions, comments, discussion
3:00 Adjourn
Previous: Participation
Next: Registration
Workshop Index
DIMACS Homepage
Contacting the Center
Document last modified on September 5, 2007.