Cookies

We use cookies to ensure that we give you the best experience on our website. By continuing to browse this repository, you give consent for essential cookies to be used. You can read more about our Privacy and Cookie Policy.


Durham e-Theses
You are in:

Models for multivariate data with latent structures with application in regression and clustering

ZHANG, YINGJUAN (2024) Models for multivariate data with latent structures with application in regression and clustering. Doctoral thesis, Durham University.

[img]
Preview
PDF - Accepted Version
7Mb

Abstract

A novel approach is proposed for analyzing clustered and highly correlated multivariate data where a one-dimensional latent structure, parametrized by a single random effect, is used to approximate the data. The estimation methodology makes use of a nonparametric maximum likelihood-type approach, where the random effect distribution is approximated by a discrete mixture, hence allowing for the use of the ECM algorithm for the estimation of all model parameters. We derive the estimators required for the subsequent ECM algorithm under var- ious error variance parameterizations that may depend on the random effect. We extend the proposed model by including covariates, enabling regression of multivariate responses on these covariates, introducing another perspective for analyzing multivariate data whereas typically only one variable is taken as the response variable with the remaining variables constituting a multivariate space of predictors. Accounting for the multivariate response character has several inferential benefits including potentially reduced standard errors and increased powers especially for situations where the main concern is the effect of several correlated response variables on a set of predictors. We further extend this methodology to a two-level version to accommodate repeated measurements. Simulation studies are conducted to assess the accuracy of parameter estimators, the significance of choosing the correct mixture components, and the use of AIC and BIC as model selection criteria. Additionally, the impact of the random effect distribution is examined. Furthermore, several important inferential problems, including clustering using different techniques, projection, ranking, regression on covariates, and regression of an external response on the predicted latent variable, are considered and illustrated with real data examples.

Item Type:Thesis (Doctoral)
Award:Doctor of Philosophy
Keywords:Random effect modelling; Multivariate data analysis; Clustering
Faculty and Department:Faculty of Science > Mathematical Sciences, Department of
Thesis Date:2024
Copyright:Copyright of this thesis is held by the author
Deposited On:26 Nov 2024 09:07

Social bookmarking: del.icio.usConnoteaBibSonomyCiteULikeFacebookTwitter