Cookies

We use cookies to ensure that we give you the best experience on our website. By continuing to browse this repository, you give consent for essential cookies to be used. You can read more about our Privacy and Cookie Policy.


Durham e-Theses
You are in:

Joint Cohort Detection & Predictive Modelling alongside Safe Model Updating

EMERSON, SAMUEL (2024) Joint Cohort Detection & Predictive Modelling alongside Safe Model Updating. Doctoral thesis, Durham University.

[img]
Preview
PDF - Accepted Version
19Mb

Abstract

A common objective provided by stakeholders, given a supervised dataset, is to construct a predictive model of the response given the covariates. If a clustering structure is suspected (such that different clusters interact with the response in different ways) then an additional objective may be given to detect these clusters, or cohorts, such that interventions based on the predictive model can be adapted for each group.

The solution to this problem requires a balanced handling of both objectives through a joint cohort detection and predictive modelling method. Previous solutions to this issue often favour one objective over the other. Indeed, cohort detection takes prevalence for unsupervised clustering methods such as K-means (which are followed by cluster-specific models for prediction), whereas accurate prediction takes prevalence for supervised clustering methods such as mixture models (which use clustering solely as a tool for more accurate modelling).

This thesis aims to provide a method that focuses on cohort detection by providing a non-probabilistic partitioning of the data whilst simultaneously focusing on accurate predictive modelling by allowing the Bayesian evidence of the model to dictate the partition. A graphical representation of the data is constructed to ensure the partitioning both respects the structure in covariate space and reduces the number of possible partitions (and hence models) one would have to consider. The latter point is particularly important as the Bayesian evidence is determined through Sequential Monte Carlo, a computationally expensive but necessary process used to ensure the estimated measure that selects the partition is accurate. This method has an associated R package (UNCOVER) for implementation.

Finally, a separate contribution is discussed in this thesis surrounding the topic of safe modelling updating. Specifically, this refers to the use of hold-out sets when updating a model to avoid interventions negatively impacting model quality. Contributions to this field are: a method of locating the minimum hold-out set size through Gaussian process emulation of a total cost function and a discussion on the impacts of clustering in this setting.

Item Type:Thesis (Doctoral)
Award:Doctor of Philosophy
Keywords:Clustering;Predictive modelling;Bayesian;Minimum Spanning Tree;Sequential Monte Carlo;Safe model updating;Holdout set;Gaussian Process;Emulation
Faculty and Department:Faculty of Science > Mathematical Sciences, Department of
Thesis Date:2024
Copyright:Copyright of this thesis is held by the author
Deposited On:28 Feb 2024 11:20

Social bookmarking: del.icio.usConnoteaBibSonomyCiteULikeFacebookTwitter