Skip to main content

Human-in-the-Loop AI for knowledge extraction from archaeological legacy documentation

Online

05MayNevio Dubbini, Ph.D
Nevio Dubbini. Image provided by the speaker
Part of Computational Humanities Research Group Seminar Series

 

The ARCHIVE project developed a human-in-the-loop system that integrates text-based and vision-based OCR, Natural Language Processing, and Large Language Models to extract, contextualize, and structure information and metadata from archaeological legacy documentation. ARCHIVE combined fine-tuned transformer-based models with systematic prompt optimization implemented through Python-based pipelines. Specialized archaeological language, is addressed through iterative prompt design and domain constraints. Information extraction is anchored to specialized archaeological taxonomies and including controlled vocabularies.

Central to ARCHIVE is a human-in-the-loop paradigm, in which automated processes highlight candidate segments for information extraction, assign confidence scores to extracted elements and expose structured reasoning chains. Threshold-based warnings flag low-confidence outputs, ambiguous classifications, and missing extractions, while significant divergences across repeated LLM applications are flagged.

Evaluation relied primarily on structured expert review, as conventional NLP metrics proved insufficient for capturing semantic adequacy and disciplinary relevance in historically layered texts. Tested on approximately 5,000 excavation records, the system demonstrated substantial reductions in manual processing time while improving the consistency, transparency, and reusability of extracted information. The project was funded by the Italian Ministry of University and Research (MUR) under the National Recovery and Resilience Plan (PNRR), through the Extended Partnership Future Artificial Intelligence Research (FAIR), funded by the European Union – NextGenerationEU.

The talk will take place online and in person on 5 May 2026 at 3 pm BST.  You will receive the link to join the talk remotely approximately one week before the seminar.

Speaker:

Nevio Dubbini, Ph.D. in Applied Mathematics, is currently Research Fellow (RTDa) at the University of Pisa and CEO of Miningful. Miningful integrates AI-based predictive systems into companies’ production, commercial, logistics, and distribution processes. Miningful also supports scientific publications and research projects with statistical, machine learning and AI expertise.

Nevio is an expert in artificial intelligence, working at the intersection of AI and archaeology to advance the analysis, classification, interpretation and visualization of archaeological data. He has contributed to major European projects, developing innovative methods for predictive archaeology, digital documentation, and automated artifact recognition. His long-term goal is to understand the mechanisms that enable the prediction of future events, exploring how predictions emerge from information, pattern recognition, and the temporal structure of data, and how they can generate new knowledge.


Search for another event