Cookies

We use cookies to ensure that we give you the best experience on our website. By continuing to browse this repository, you give consent for essential cookies to be used. You can read more about our Privacy and Cookie Policy.


Durham e-Theses
You are in:

Clinical Video Analysis with Geometric Feature Enhanced Deep Learning

ZHANG, XIATIAN (2025) Clinical Video Analysis with Geometric Feature Enhanced Deep Learning. Doctoral thesis, Durham University.

[img]
Preview
PDF - Accepted Version
Available under License Creative Commons Attribution 3.0 (CC BY).

51Mb

Abstract

Clinical videos are essential in medical intervention, diagnosis, and training, yet their analysis presents substantial challenges due to the complexity and variability inherent in clinical environments. Traditional methods, reliant on manual annotation and human expertise, are limited in scalability and efficiency, particularly in resource-constrained settings. While deep learning offers promising avenues for automation, conventional RGB-based approaches struggle with issues such as occlusions, poor visibility during surgeries, and complex clinical backgrounds. To address these issues, the use of geometric features, such as bounding boxes, depth maps, and human skeleton data, provides a promising solution. These features enable efficient and robust structured understanding in clinical video analysis. This thesis explores how geometric feature enhanced deep learning can address these challenges, focusing on three critical objectives: long-term video anticipation, video quality improvement, and fine-grained semantic understanding.

For long-term video anticipation, a novel adaptive graph learning framework leveraging geometric features as the primary input is proposed for surgical workflow anticipation. This framework introduces a novel geometric representation including bounding boxes of surgical instruments and anatomical targets. Its adaptive graph dynamically selects and updates graph structures to capture the evolving relationships in surgical videos. Validated on two benchmark datasets, this approach demonstrates robust performance across diverse surgical scenarios, offering meaningful predictive insights for surgical teams and semi-autonomous robotic systems.

For video quality improvement, a depth-aware endoscopic video inpainting framework that fuses geometric features and visual features is introduced to address challenges in extreme clinical environments. The framework integrates a Spatial-Temporal Guided Depth Estimation module for direct depth prediction, a Bi-Modal Paired Channel Fusion module for effective visual-depth feature integration, and a Depth-Enhanced Discriminator for assessing the fidelity of reconstructed RGB-D sequences. Unlike traditional 2D-only approaches, this method incorporates depth information, significantly enhancing the realism and spatial accuracy of inpainted content in endoscopic videos.

For fine-grained semantic understanding, multi-view geometric features are integrated into clinical skill assessment frameworks for procedures such as Traditional Chinese Medicine (TCM) physical therapy and Cardiopulmonary Resuscitation (CPR). Two novel publicly accessible multi-view video datasets are introduced for TCM physical therapy and CPR, alongside the Cross-view Multimodality Enhanced Action Quality Assessment framework. This framework combines geometric and visual features for clinical skill assessment, supporting single-view input during inference while retaining multi-view awareness from training. It significantly improves performance in complex tasks such as Needle Depth and Quick Needle Movements. Furthermore, in experiments with the CPR dataset, the proposed framework delivered performance comparable to that of human experts.

By respectively integrating geometric features as input, for feature fusion, and through multi-view approaches within deep learning frameworks, this thesis demonstrates significant improvements in addressing distinct challenges in clinical video analysis through geometric feature enhanced deep learning. The results hold promising potential for further applications in automated clinical video analysis, including medical intervention, diagnostics, and training. Most of the works have been recognized in peer-reviewed conferences and journals, underscoring their impact and relevance within the field.

Item Type:Thesis (Doctoral)
Award:Doctor of Philosophy
Keywords:Deep Learning; Computer Vision; Geometric Feature; Clinical Video Analysis; Surgical Workflow Anticipation; Video Inpainting; Action Quality Assessment
Faculty and Department:Faculty of Science > Computer Science, Department of
Thesis Date:2025
Copyright:Copyright of this thesis is held by the author
Deposited On:02 May 2025 15:19

Social bookmarking: del.icio.usConnoteaBibSonomyCiteULikeFacebookTwitter