CUI, TINGJIAO (2024) Causal Bayesian machine learning to assess the heterogeneous effect of smoking or drinking with mortality: A longitudinal analysis of Chinese Longitudinal Healthy Longevity Study. Masters thesis, Durham University.
| PDF 1453Kb |
Abstract
This dissertation introduces an innovative methodology employing Bayesian machine learning to identify heterogeneous effects. This method provides a quantitative perspective on potential effect-modifying factors influencing the heterogeneity in the associations between the independent variable and the outcome.
The study cohort consisted of 43,487 individuals from the Chinese Longitudinal
Healthy Longevity Survey (CLHLS), a longitudinal cohort study of elderly Chinese
individuals. Numerous studies have shown that the association between smoking
or drinking and mortality varies significantly with mediators such as physical activity level and diet. However, only some studies have systematically assessed the heterogeneous effects of this association. To address this gap, this research investigates the association between smoking or drinking and mortality across several subgroups identified by the Bayesian machine learning method. The results reveal significant variations in the association based on body weight and physical activity levels. Specifically, among individuals weighing 57 kilograms or more, a heightened risk of mortality is observed with low levels of physical activity. In contrast, among individuals weighing less than 57 kilograms, only a high level of physical activity is linked to an increased mortality risk.
The methodology employed in this study involves a two-step approach. First,
the missForest algorithm was used for data imputation to handle missing values,
ensuring a robust and accurate dataset. MissForest’s non-parametric nature and
effectiveness in managing complex interactions and non-linear relationships make it an ideal choice for this diverse dataset. Second, Bayesian Additive Regression Trees (BART) were applied to analyze the imputed data. BART is particularly adept at capturing non-linear relationships and interactions among predictors, enhancing the statistical power to detect true heterogeneous effects.
By handling the imputation separately, we ensured that the BART model could
focus on identifying and modeling the intricate interactions between smoking, drinking, and mortality without the additional complexity of simultaneously imputing missing values. In summary, using missForest for imputation, followed by BART for modeling, provided a robust and effective methodology for our study. This combination leveraged the strengths of both techniques, ensuring accurate and reliable imputation of missing data and powerful, flexible modeling of the relationships between variables.
This research demonstrates that the Bayesian machine learning method can effectively identify heterogeneous effects between the independent variable and the outcome. The integration of advanced statistical methods highlights the potential for precision medicine approaches in epidemiological research. Furthermore, the findings highlight the multifaceted nature of the relationships among body weight, physical activity, smoking or drinking, and the risk of mortality. This underscores the importance of considering lifestyle factors, such as smoking or drinking, along with body weight and physical activity, when examining mortality risk. These insights are valuable for precision medical interventions.
The methodology, specifically Bayesian Additive Regression Trees (BART), demonstrates transparency, reproducibility, and robustness. This research contributes to the biomedical field by providing valuable methodological insights and advancing our understanding of potential effect-modifying factors in complex associations. Further research is warranted to explore the underlying mechanisms and potential confounding factors that may influence these associations.
Item Type: | Thesis (Masters) |
---|---|
Award: | Master of Science |
Keywords: | Causal Bayesian machine learning, missForest algorithm, Bayesian Additive Regression Trees (BART), Heterogeneous exposure associations (HEAs), Chinese Longitudinal Healthy Longevity Survey (CLHLS). |
Faculty and Department: | Faculty of Science > Mathematical Sciences, Department of |
Thesis Date: | 2024 |
Copyright: | Copyright of this thesis is held by the author |
Deposited On: | 20 Aug 2024 16:18 |