Guest post by Professor Christian Probst and Gregor Steinhorn
In 2012 Unitec founded the Cybersecurity Research Centre, which has investigated a multitude of cybersecurity challenges since then.
The latest research hosted at the centre is developing techniques to keep people’s private information secure. In 2019 Professor Christian Probst, then Head of Unitec’s School of Computing, won an MBIE Endeavour Fund Smart Ideas grant for the research project “Calculating the reidentification risk of anonymised datasets using Bayesian probabilistic programming.” The project began in early 2020 as a collaboration with the University of Auckland and the IT University of Copenhagen, and is now generating first results.
The research in this project is of wide importance, as sharing data lies at the core of modern data-driven society and industry; it enables efficient and reproducible processes, and a vibrant digital economy. Areas like pharmaceutical and health research and industry benefit tremendously from the ability to share and analyse patient data, as do social networks by creating revenue through advertising targeted to their members.
Recent legislation, such as the revised New Zealand Privacy Act and the General Data Protection Regulation (GDPR) in Europe, controls how ‘personal information’ may be collected, used, disclosed, stored and accessed. These legislative measures are necessary, since the value of sharing data comes at the price of a risk to privacy protection.
To protect the privacy of individuals and to minimise the risk of re-identification, data should usually be anonymised before sharing. Current methods are mostly based on indirect mathematical approaches that are difficult to evaluate and hard to explain. This hinders data sharing, as neither scientists, lawmakers, nor data subjects can appreciate the guarantees provided.
Unitec’s Smart Ideas project is developing novel methods to measure the quality of anonymisation of data. The proposed measure is the probability of re-identification of a subject by an attacker in the presence of as-yet unknown data. Probability is a commonly understood measure of risk and will make it easier for the general public, as well as organisations that share data, to compare risks from different data-sharing and anonymisation approaches.
The project will provide a new theoretical framework for measuring the anonymity of data before sharing (improving on existing, established solutions), and efficient inference algorithms and effective tools to demonstrate the applicability of the resulting approach.
Beneficiaries of the project’s results will include public agencies, companies, iwi groups and citizens. These groups will be empowered to understand the risk of sharing data, and to choose an anonymisation algorithm that fulfils certain criteria. In the long run, it is hoped that this will have an influence on data sharing policies and privacy acts in New Zealand and overseas.
Unitec, with an industry partner, is currently exploring the use of the method to asses privacy concerns when using genetic data for research and diagnostics. In health data, the strong need for privacy coincides with complex data correlations that make it difficult to assess and communicate the risk of unsuitable anonymisation approaches. This project aims to provide a practical tool for regulators, industry and users to understand the risks involved.
The project team is also exploring further areas to expand the application of this technology. To discuss potential research collaborations with the team, please contact the Principal Investigator Professor Christian Probst (cprobst@unitec.ac.nz) or Unitec’s Research Partner – Enterprise, Gregor Steinhorn (gsteinhorn@unitec.ac.nz).