Skip to main content

How does language change and variation affect our ML models?

Dr Haim Dubossarsky's talk is organised by the Computational Humanities Research group at the Department of Digital Humanities of King's College London.

Applied Machine Learning techniques, especially in the textual domain, have introduced us to the craft of transfer learning. Simply put, we can take an off-the-shelf model, fine-tune it on a curated training set specific to the task at hand, and expect performance improvements. This approach became even more promising with the emergence of multilingual models such as XLM-R or mBERT, which allow fine-tuning on a task in language X and expect performance gains on the same task in language Y, at least in theory. However, languages exhibit diverse behaviours in different contexts, and fine-tuning a model for a specific task may degrade its performance in slightly different linguistic contexts that were not initially considered.

Furthermore, cross-lingual transfer learning relies heavily on assumptions about underlying linguistic factors shared between languages, many of which have not been thoroughly tested. In this talk, Dr Haim Dubossarsky will focus on two recent works that highlight the limitations of the common approach in modern ML application. The first demonstrates how linguistic input perturbations, stemming from language changes due to reclaimed language, significantly impede the performance of hate speech detection models. In the second work, Dr Haim Dubossarsky will illustrate how multilingual language models fail to transfer from English to Hindi in a polysemy detection task, despite the promise of multilingual support. He will then propose potential solutions to these challenges.

Dr Haim Dubossarsky is a Lecturer in the School of Electronic Engineering and Computer Science at Queen Mary University of London, an Affiliated Lecturer in the Language Technology Lab at the University of Cambridge, and a recently appointed Turing Fellow. His research focuses on Natural Language Processing (NLP) and Artificial Intelligence (AI), with a particular emphasis on the intersection of linguistics, cognition, and neuroscience.

His work has made significant contributions to the emerging field of computational semantic change and has delved into investigating the societal impact and biases of modern NLP tools. Haim employs advanced mathematical and computational methods across disciplines, enriching research and pushing the boundaries of knowledge in NLP and related fields. His interdisciplinary approach often uncovers novel research questions that were previously inaccessible through more traditional methods.

Depending on whether you have registered to attend online or in person, you will receive the link to join the talk remotely or the details of the location approximately one week before the seminar.


Search for another event