;

Research, Collaboration, Impact! New privacy-preserving algorithms improve privacy online and offline

Dr Grigorios Loukides

Senior Lecturer in Computer Science

23 October 2023

Technology & Science

Collaboration and the culture that underpin it are necessary cornerstones to tackle society's biggest challenges and make an impact on the world around us. In 'Research, Collaboration, Impact!' we examine how teams in the Department of Informatics are working to overcome some of the world's biggest issues, and the partners they're working with to ensure their research is making it a better place. In this issue, we sat down with Dr Grigorios Loukides to talk about how his new algorithms are improving our privacy online and offline.

Strings, or sequential data, are a crucial and ubiquitous data type that underpins vast amounts of confidential information such as health records or financial transactions. Dr Grigorios Loukides and his collaborators have developed new models and algorithms that preserve the privacy of strings so that they can’t be exploited to reveal critical and highly sensitive data. Possible areas of applications of their new methods include medical research, online safety and secure banking.

Background

Strings, sequences of elements, are everywhere. DNA and binary computer code are good examples of just how ubiquitous sequential data are. In our DNA, genetic information is represented by strings of the letters A,C, G and T–short for the four bases adenine, cytosine, guanine and thymine. In computing, code written by humans is translated into machine-readable strings of 0s and 1s that tell the computer what to do. And of course, strings of numbers on our latest bank statement tell us how much money we have left available.

In principle, strings can encode almost anything, which makes them one of the most important and fundamental types of data. As they oftentimes encode highly personal and sensitive information, the privacy of strings must be protected. A Senior Lecturer at the Department of Informatics, Dr Loukides and his collaborators realised just how big a gap there was in building up privacy-preserving technology for strings, which got them started on a two-year research project at King's. There are plenty of use cases that highlight the urgency of this line of work. ‘A string can easily reveal our location and our actions, like GPS data for instance, and in some cases we don’t want people to know when we visit a hospital’.

Privacy is only one side of the story. Our approach protects the privacy and utility of the data we send. We can give bank employees access to the credit card data they need to see to help us but not our entire account history.– Dr Grigorios Loukides

What did the project involve?

With funding of £220,000 from a grant from the prestigious Leverhulme Trust in the bag, Grigorios assembled a team of collaborators: Professor Costas Iliopoulos, Professor of Algorithm Design at the Department of Informatics and Dr Kirk Plangger, Reader in Marketing at King’s Business School. One of the key drivers of their work is to secure strings against illegitimate use by third parties. However, this is not all. ‘Privacy is only one side of the story’, Grigorios says. ‘The other is to make sure that we can still use the data fully once we have protected it’. Effectively, this meant for the team to work towards a dual objective: to preserve both the privacy of strings and the overall utility of the data at hand. For instance, we would like for cancer researchers to work with DNA samples while making it difficult to identify individual donors, or give bank employees access to some credit card data but not our entire account history.

The team quickly realised that an entirely new approach to designing models and algorithms was required to make sequential data truly resilient. To achieve this, a new type of data structure had to be designed that can carry inbuilt privacy protection. ‘We call this new principle ‘reverse-safe data structures’, Grigorios continues. Hitherto unaddressed, the new data structure tackles the issue of data protection in different settings. Essentially, it makes it prohibitively difficult to reverse-engineer the dataset represented by the data structure based on the answers of queries posed, for example, by an online search engine, which is great for online privacy. The algorithms that Grigorios and his team have developed are fast enough to be immediately useful. They found that it will only take a few minutes to secure texts of millions of letters with this new method.

Protecting privacy is very important when it comes to marketing. If you go to an expensive restaurant, you don't want a marketing tool to only show you expensive flights next time you go to book because you've been classed as affluent"– Dr Grigorios Loukides

Grigorios finds King's to be an excellent home for his interdisciplinary research that involves academics from across the College. ‘When I put together the application for the project, I needed the domain expertise of somebody who works with marketing data’, Grigorios recollects. ‘Because the Business School is in the same building it was very easy to reach out—King's absolutely encourages collaboration’, he continues.

With a view to marketing, one of the many applications of the new privacy-preserving data structure the team has assembled lies in securing customer data against aggregation. It makes it difficult for companies to extract signals of consumer behaviour for targeting purposes. ‘Perhaps you went to an expensive hotel, and then you had dinner at a very expensive restaurant’, Grigorios gives an example. ‘Then you probably don’t want the marketing tool to classify you as an affluent person so that next time you book an airline ticket you only get to see high prices’.

On their research journey, Grigorios and his team had to solve some fundamental problems in theoretical computer science that had never been addressed. ‘We needed to investigate solutions to sub-problems that were necessary to improve the effectiveness and efficiency of our main methods’, he says. This involved making headway on a new sampling approach to protect strings from inferences about sub-strings, or novel ways to minimise utility loss in frequent pattern mining. Many of the team’s theoretical advances should translate into further applications in the near future.

Outcome: open-source software and a steady stream of publications

‘It is important that the algorithms and data structures of this novel approach are implemented as open-source software’, Grigorios says. ‘We really want the software prototypes to be publicly available so that people can compare our approach to their own methods or use it as part of these methods’, he adds. Besides software, the team has published extensively in the areas of data mining, databases and theoretical computer science, which attests to the wide significance of the work across many fields. The researchers have also presented their findings at a number of important conferences.

Looking ahead, what is next for Grigorios given the big success of his project? ‘The next step for me is to address some scalability issues’, he says. ‘I’ve also realised that some of our methods may have important applications in the detection of money laundering activity that I’m hoping to explore further’. If previous achievements and Grigorios’s dedication are an indication of future success, the research community in future-proofing our sequential data can look forward to having yet another string of good luck.

Written by Juljan Krause