Thomas Lasko, MD, PhD, from Google Inc., Mountain View CA, will be giving a seminar next week entitled:
"Spectral Anonymization Data"
March 3, 2010
12:00 - 1:00 p.m.
214 Light Hall
The great challenge of data anonymization is to condition a dataset for public release such that it remains scientifically useful, but it does not disclose sensitive information about the individuals it describes. The challenge arises every time we consider releasing a clinical dataset, and with genomic data the problem is orders of magnitude more difficult. It is not sufficient to simply remove directly identifying information, because the remaining clinical data may be enough to infer an identity or otherwise learn sensitive information about an individual, especially if the attacker has access to auxiliary information.
Data anonymization has been an area of active research for several decades, yet almost every aspect of it remains an open question: How do we measure the risk of disclosure, and what amount of risk is acceptable? What is the optimal method of perturbing the data to achieve this protection? How do we measure the impact of the perturbation on scientific analysis, and what is an acceptable impact? Dozens of anonymization methods have been proposed over the years, but none has been successful at simultaneously providing perfect privacy protection and allowing perfectly accurate scientific analysis. One respected researcher opined in print that for data involving more than a few dimensions, the anonymization problem appears impossible.
In this talk, I make the claim that the problem is not impossible, but historically we have imposed unnecessary and conflicting constraints on it. By relaxing these constraints and working under strong and defensible definitions of privacy and utility, we can simultaneously achieve both perfect privacy and perfect utility, even in high dimension. I demonstrate how the principle of Spectral Anonymization relaxes some of these unnecessary constraints, and I present a concrete algorithm that achieves practically perfect privacy and utility using spectral kernel methods of machine learning.