Cookies

We use cookies to ensure that we give you the best experience on our website. By continuing to browse this repository, you give consent for essential cookies to be used. You can read more about our Privacy and Cookie Policy.


Durham e-Theses
You are in:

Natural Language Processing with Deep Latent Variable Models: Methods and Applications

YU, JIALIN (2023) Natural Language Processing with Deep Latent Variable Models: Methods and Applications. Doctoral thesis, Durham University.

[img]PDF
2734Kb

Abstract

Due to their unparalleled performance and versatility, deep learning has become the de facto standard for building natural language processing (NLP) applications. Compared with conventional machine learning approaches, deep learning replaces extensive hand-engineered features in every task with end-to-end representation learning. Several concerns, however, have been raised in the research communities regarding their robustness, trustworthiness, explainability, and interpretability. Although these limitations of deep learning methods are widely acknowledged, work in methods and applications to alleviate these concerns in NLP is contrastingly limited. To address this research gap and explore a more robust approach for building NLP applications with deep learning, in this thesis, we studied deep latent variable models (DLVMs) in terms of methods (under supervised and semi-supervised learning settings) and applications (natural language understanding and generation) perspective for building natural language processing applications. We demonstrate the strength and benefits of DLVMs for NLP applications and discuss their effectiveness in addressing some of these concerns later in this thesis.

For contributions from a methods perspective, we studied the benefits of deep latent variable models in supervised and semi-supervised learning settings. These studies suggested that deep latent variable models are competitive in performance against standard deep learning methods; while offering additional robustness, trustworthiness, explainability and interoperability in various applications. For semi-supervised learning, particularly, we achieve state-of-the-art performance and prove the great potential of using deep latent variable models for semi-supervised learning problems.

For contributions from an applications perspective, we first presented two applications for language understanding problems, followed by two more applications for language generation problems. Our first application concerns a binary text classification task in the educational domain and pioneers the first research on how Bayesian deep learning can be applied to this text-based educational application. Our second application focuses on multilabel text classification tasks, and we present an efficient uncertainty quantification framework as our contribution. We demonstrate the effectiveness and generalisation of this framework with diverse architectures and present the first research on using deep latent variable models for efficient uncertainty quantification purposes in multilabel text classification tasks. Our third application deals with multiple explanation generation for an explainable artificial intelligence (XAI) task, and we present a first study on how deep latent variable models can be used to generate multiple explanations in the Stanford natural language inference task. In our last application, we explore paraphrase generation tasks and present the first study of DLVMs in a semi-supervised learning setting in paraphrase generation tasks; the DLVMs can enhance paraphrase generation performance when incorporating unlabelled data in a semi-supervised manner.


The findings in this thesis are of practical value to deep learning practitioners, researchers, and engineers working on a variety of problems in the field of natural language processing and deep learning.

Item Type:Thesis (Doctoral)
Award:Doctor of Philosophy
Keywords:Deep Learning; Natural Language Processing; Deep Latent Variable Model
Faculty and Department:Faculty of Science > Computer Science, Department of
Thesis Date:2023
Copyright:Copyright of this thesis is held by the author
Deposited On:13 Mar 2023 11:59

Social bookmarking: del.icio.usConnoteaBibSonomyCiteULikeFacebookTwitter