IBM Research - Ireland Internship Project: Data Privacy for Health Care - overview
Organizations, public bodies, institutes and companies gather enormous volumes of data that contain personal information. For reputation, compliance and legal reasons, the personal information needs to be de-identified before shared with third parties, such as analytics teams or research scientists. The healthcare domain is particularly challenging since it deals with highly sensitive information. The de-identification process aims to achieve the following three goals: a) significantly and provably minimize the re-identification risk b) maintain a high level of data utility to allow supporting intended secondary purposes and c) maintain the truthfulness of the data at a record level to the largest possible extent.
This project aims to explore innovative ways to provide a framework for calculating re-identification risk in meaningful and realistic settings and generating reports for a mixed audience of technical, legal and compliance audience. The project will build the foundational metrics for capturing the balance between information loss and risk assessment. The end goal is a research prototype that will demonstrate the framework in various scenarios.
Programming (Java 8 or Python)
Basic background on information theory and statistical disclosure control
Basic database management and query
Basic knowledge of Spark-based computing (optional)
Familiar with cloud services (optional)
Good presentation skills