LIU, JIAXU (2026) Understanding 3D Point Cloud via Unsupervised Learning, Diffusion-based and Frequency-guided Generative Deep Learning Architectures. Doctoral thesis, Durham University.
| PDF - Accepted Version 26Mb |
Abstract
With the rapid development of 3D sensing technologies, point clouds have emerged as a fundamental representation for numerous 3D understanding tasks, including classification, segmentation, and generative modeling. Despite their potential, point clouds remain challenging to process due to their unstructured nature and the high computational cost involved in extracting meaningful features. This thesis explores novel methods for both understanding and generating 3D point cloud data, with a particular focus on unsupervised learning and diffusion-based generative modeling. To tackle the challenge of semantic segmentation without manual labels, we introduce a novel unsupervised segmentation framework that combines deep clustering with traditional k-means and superpoint-based methods. This approach enables the model to discover meaningful semantic structures directly from raw point clouds data, eliminating the need for costly human annotations. The framework effectively groups points into semantically coherent regions, demonstrating strong performance across diverse datasets. On the generative modeling front, we propose a new class of diffusion models tailored specifically for point clouds. The first method introduces a one-step, time-variant, frequency-aware diffusion approach. By leveraging the Laplacian operator, we extract frequency-domain features from point clouds and enhance high-frequency components throughout the diffusion process. This frequency-aware strategy, when combined with a powerful latent representation learned using Mamba, enables the synthesis of high-fidelity, semantically rich point cloud samples. Building upon this, we also develop a two-stage generative framework that integrates a variational autoencoder with a latent diffusion model, inspired by stable diffusion techniques. This method features a frequency-aware module that enriches the VAE’s latent space with detailed spectral information, which is then further refined during the latent diffusion stage. A specialized architecture for the latent space ensures that fine-grained geometric details are preserved and that the overall generative process remains robust and expressive. Extensive experimental evaluations demonstrate the effectiveness and versatility of our proposed approaches. Both the segmentation and generation techniques achieve state-of-the-art results on widely used benchmark datasets. These contributions significantly advance the field of 3D point cloud understanding and synthesis, offering scalable and annotation-efficient solutions with practical applications in areas such as computer vision, robotics, and graphics.
| Item Type: | Thesis (Doctoral) |
|---|---|
| Award: | Doctor of Philosophy |
| Faculty and Department: | Faculty of Science > Computer Science, Department of |
| Thesis Date: | 2026 |
| Copyright: | Copyright of this thesis is held by the author |
| Deposited On: | 19 Jan 2026 09:25 |



