Depositor Login | Administrator Login

Reformulation and Decomposition: Multitask learning approaches to Long Document Problems

HUDSON, GEORGE,THOMAS (2024) Reformulation and Decomposition: Multitask learning approaches to Long Document Problems. Doctoral thesis, Durham University.

Preview

PDF - Accepted Version
37Mb

Abstract

Recent advances in Natural Language Processing (NLP) have led to success across a wide range of tasks including machine translation, summarization, and classification. Yet, the field still faces major challenges. This thesis addresses two key under-researched areas: the absence of general multitask learning capabilities, and the inability to scale to long, complex documents. Firstly, this thesis explores a form of multitasking where NLP tasks are reformulated as question answering problems. I examine existing models and measure their robustness to paraphrasing of their input. I contribute an annotated dataset which enables detailed analysis of model failures as well as evaluating methods for improving model robustness. Secondly, a set of long document tasks; MuLD, is introduced which forms a benchmark for evaluating the performance of models on large inputs with long-range dependencies. I show that this is a challenging task for baseline models. I then design an approach using task-decomposition to provide an interpretable solution which easily allows for multitask learning. I then explore how these themes of task reformulation for multitask learning, and task-decomposition for long inputs can be applied to other modalities. I show how visual modelling: a visual analogue of language modelling, can be used to predict missing frames from videos of simple physics simulations, and probe what knowledge about the physical world this induces in such models. Finally, I demonstrate how this task can be used to unite vision and NLP using the same framework, describing how task-reformulation and task-decomposition can be used for this purpose.

Item Type:	Thesis (Doctoral)
Award:	Doctor of Philosophy
Keywords:	machine learning; multitask learning; task decomposition; natural language processing; long documents; computer vision; task reformulation
Faculty and Department:	Faculty of Science > Computer Science, Department of
Thesis Date:	2024
Copyright:	Copyright of this thesis is held by the author
Deposited On:	16 May 2024 09:36

Social bookmarking:

Reformulation and Decomposition: Multitask learning approaches to Long Document Problems

Abstract

Quick links

Prospective students