FARRELL, SEAN (2025) Natural Language Processing for Early Detection and Mitigation of Public Health Threats. Doctoral thesis, Durham University.
Full text not available from this repository. Author-imposed embargo until 01 October 2026. |
Abstract
Veterinary electronic health records (vEHRs) represent a vast yet underutilised resource with the potential to advance animal welfare, strengthen public health, and drive innovations in healthcare informatics. This thesis presents a framework for utilising vEHRs through Natural Language Processing (NLP) techniques, contributing novel methodologies and insights across five key areas. I introduce PetBERT, a foundation model trained on 500 million tokens from first-opinion vEHRs and forms the backbone of our syndromic disease surveillance system, establishing a new state-of-the-art approach for monitoring disease outbreaks within the UK, actively deployed as an early warning mechanism for emerging veterinary diseases. I present a hierarchical language model applied to 1.4 million antimicrobial prescriptions, revealing significant species-specific discrepancies in antimicrobial use and adherence to antimicrobial stewardship guidelines, offering a scalable solution for stewardship monitoring. I present a novel text-tabular explainability approach focusing on premature mortality and identifying previously unrecognised risk factors, including the significant influence of socioeconomic status on health outcomes. Recognising the importance of responsible data sharing, I developed PetHarbor, the first data governance framework for vEHRs. Working collaboratively with the international community, this framework standardises protocols for data sharing while maintaining privacy and ethical standards. Finally, I contribute PetEVAL, the first open evaluation benchmark for vEHRs, releasing 17,000 annotated records for anonymisation, disease extraction, and syndromic classification tasks. This resource enables reproducible research of this thesis, establishes vEHRs as a transformative resource for healthcare informatics, and charts a path for a standardised evaluation framework for future developments in veterinary NLP. By embedding open science at its core, this thesis demonstrates that vEHRs are not merely a neglected data source but a powerful engine for advancing animal health, tracking diseases in real-time, and informing global health policy.
Item Type: | Thesis (Doctoral) |
---|---|
Award: | Doctor of Philosophy |
Keywords: | Veterinary, Natural Language Processing, Public Health, Electronic Health Records, PetBERT, PetEVAL, PetHarbor |
Faculty and Department: | Faculty of Science > Computer Science, Department of |
Thesis Date: | 2025 |
Copyright: | Copyright of this thesis is held by the author |
Deposited On: | 01 Oct 2025 11:45 |