Expanding PREMOVE: An LLM-Assisted Semantic Annotation of Preverbs Across Historical and Modern Languages
Preverbs, which are prefixes attached to verbs that modify their meaning, substantially shape the morpho-semantics of many Indo-European languages. While their diachronic development has traditionally been described through qualitative studies, recent computational approaches offer new avenues for quantitative analysis. This seminar presents an expansion of PREMOVE, a cross-linguistic diachronic dataset dedicated to Latin and Ancient Greek preverbs.
Moving beyond the dataset's original focus on motion verbs in classical languages, this project introduces two significant updates: the inclusion of perception verbs in Latin and Ancient Greek and the addition of motion verbs in Italian. This dual expansion aims to enable a novel comparison of preverb semantics across different verb classes and between historical languages and their modern descendants.
The presentation will focus both on the resource itself and on the methodological framework behind it: a semi-automatic annotation pipeline leveraging Large Language Models (LLMs). Techniques such as few-shot prompting, style control, and chain-of-thought reasoning were employed to generate candidate labels, which were subsequently validated by human experts. The talk will discuss how this hybrid workflow can accelerate the creation of high-quality linguistic resources and facilitate the study of semantic change, in an effort to offer a reproducible model for historical language projects. The resulting data, soon to be integrated into existing knowledge bases, will provide new insights into how preverb semantics are encoded across languages and time.
The project is supported by the Humanities and cultural Heritage Italian Open Science Cloud (H2IOSC) and by CLARIN.
This is a hybrid event; the details will be shared upon registration.
Speaker:
Michele Ciletti is a Master’s student in Classical Philology, Languages and History at the University of Foggia, where he also works as a Research Assistant in Digital Humanities and Computational Linguistics. His research interests include the application of NLP techniques to Classical Languages, Open Data, and Digital Humanities Pedagogy.
Search for another event
