## K. Cai, X. Xiao, and G. Cormode.
Privlava: Synthesizing relational data with foreign keys under
differential privacy.
In *ACM SIGMOD International Conference on Management of Data
(SIGMOD)*, 2023.

Answering database queries while preserving privacy is an important problem that has attracted considerable research attention in recent years. A canonical approach to this problem is to use *synthetic data*. That is, we replace the input database *R* with a synthetic database *R*^{*} that preserves the characteristics of *R*, and use *R*^{*} to answer queries. Existing solutions for relational data synthesis, however, either fail to provide strong privacy protection, or assume that *R* contains a single relation. In addition, it is challenging to extend the existing single-relation solutions to the case of multiple relations, because they are unable to model the complex correlations induced by the foreign keys. Therefore, multi-relational data synthesis with strong privacy guarantees is an open problem.
In this paper, we address the above open problem by proposing **PrivLava**, the first solution for synthesizing relational data with foreign keys under *differential privacy*, a rigorous privacy framework widely adopted in both academia and industry. The key idea of **PrivLava** is to model the data distribution in *R* using *graphical models*, with *latent variables* included to capture the inter-relational correlations caused by foreign keys. We show that **PrivLava** supports arbitrary foreign key references that form a directed acyclic graph, and is able to tackle the common case when
*R* contains a mixture of public and private relations. Extensive experiments on census data sets and the TPC-H benchmark demonstrate that **PrivLava** significantly outperforms its competitors in terms of the accuracy of aggregate queries processed on the synthetic data.

[ bib |
Alternate Version |
.pdf ]
Back

*This file was generated by
bibtex2html 1.92.*