The meeting will focus on the problem of computing with sensitive data while hiding sensitive information embedded in it. Specifically, in the context of health care. One possibly relevant technique is secure computation, where different parties compute some function of their private inputs while hiding any additional information about their inputs. Another technique is publishing "sanitized" versions of sensitive data, in which some elements were perturbed or suppressed in order to hide sensitive properties. Both `offline' and `online' versions of such techniques may be of interest. The meeting will not discuss the security of transferring data between locations/parties, since this problem has rather straightforward solutions.
Topics for discussion:
A. Topics related to secure computation: A.1. A short introduction to secure computation A.2 Identifying functions of interest for healthcare applications A.3 Modeling the adversary B. Topics related to data sanitization B.1 A short introduction to data sanitization B.2 Definitions of a privacy breach B.3 Modeling the adversary C. Other topics C.1. Relation to DRM (Digital Rights Management) C.2 Online Query Auditing C.3 Post Query Auditing C.4 Other Privacy related problems
A detailed description of the topics for discussion:
A. Topics related to secure computation:
A.1. A short introduction to secure computation
A short introduction to the goals and methods of secure computation.
A.2 Identifying functions of interest for healthcare applications: Identify specific functionalities that are used in health data applications and could use a privacy preserving solution. In other words, if we assume that there were no privacy issues, what functions (of sensitive data, possibly distributed) are commonly computed in healthcare applications? For example, consider a system that obtains information from many resources in order to do real-time detection of epidemics. What types of mathematical functions of the data does the system compute? (e.g. linear regression, averages, means, etc.). Discuss whether it's plausible to find efficient protocols for secure evaluation of these functions.
A.3 Modeling the adversary
The "adversary" is often modeled as being either "semi-honest" or "malicious". It is easier to find solutions which are secure against semi-honest adversaries, but is it reasonable to make the assumption that the adversary is semi-honest? Perhaps we can find a more reasonable model of an adversary that is more powerful than a semi-honest one but less powerful than a malicious adversary?
B. Topics related to data sanitization
B.1 A short introduction to data sanitization.
A short introduction to the goals and methods of data sanitization. Topics include the types of data that need to be sanitized, and different techniques such as cell suppression, perturbation, etc.
B.2 Definitions of a privacy breach
What are the common definitions of a "privacy breaches" that need to be prevented? What is the motivation for these definitions? In particular, what kind of privacy do cell suppression methods try to provide? The same question is also relevant with regards to other methods (e.g. counting the number of possible tables given marginals).
B.3 Modeling the adversary
Many methods for protecting data privacy assume that the data set elements are chosen independently, and that no a priori knowledge about the data is taken into account. While this may be a reasonable assumption about a legitimate data analyst, it is questionable whether this is also reasonable for modeling an attacker. Can we find a reasonable model for our adversary? In particular, it might be that the adversary has some additional, auxiliary, information that is not known to the legitimate analyst. Have there been attempts at modeling this scenario?
C. Other topics
C.1. Relation to DRM (Digital Rights Management)
DRM was developed to control consumers access to digital content (such as music) that they buy. For example, to enable listening to an album while preventing making additional copies of it. Can we apply DRM to medical data to restrict the way it is used? E.g., send medical files with enforcement mechanism that enable them to be read by specific parties, within a certain time period, etc.? Is there any hope of writing a set of rules describing the authorizations for using the data?
C.2 Online Query Auditing
Online auditing examines queries as the are generated by users, to decide whether the answers to a set of queries generated by the same user enables him to learn sensitive data (e.g. it is ok for users to ask about a patient's date of birth or about his zip code, but not about both of them). We will consider auditing (suggested in the mid 70s, and received some recent interest) to demonstrate how subtleties in definitions may affect the effectiveness of privacy preserving measures.
C.3 Post Query Auditing
An alternative form of auditing records all the queries that are done by users , and searches them, as a later time, for suspicious transactions. For example, we could search for medical personnel who usually look at patients of one type but are found to examine the records of patients of a different type, who happen to live in the same neighborhood as the clerk. There are issues of identifying suspicious activity, and of preserving the privacy and integrity of the transactions database.
C.4 Other Privacy related problems
Time permitting, we will discuss non main stream privacy problems such as private inference control (PIC) and attempts to hide some patterns in collected data while allowing other patterns to be mined. In particular - are these (and similar) problems relevant for privacy in medical databases?