We use cookies to ensure that we give you the best experience on our website. By continuing to browse this repository, you give consent for essential cookies to be used. You can read more about our Privacy and Cookie Policy.

Durham e-Theses
You are in:

MLM Diffusion: Generating Globally-Consistent High-Resolution Images from Discrete Latent Spaces

HESSEY, PETER (2023) MLM Diffusion: Generating Globally-Consistent High-Resolution Images from Discrete Latent Spaces. Masters thesis, Durham University.

PDF - Accepted Version


Context/Background: Creating deep generative models capable of generating high-resolution images
is a critical challenge for modern deep learning research, with far-reaching impacts in domains such
as medical imaging and computer graphics. One method that has recently achieved great success in
tackling this problem is probabilistic denoising diffusion. However, whilst diffusion models can generate
high quality image content, key limitations remain in terms of high computational requirements.
Aims: This thesis investigates new techniques to overcome the computational cost requirements that
currently limit generative diffusion models. Specifically, this thesis focuses on training deep learning
models to model and sample from discrete latent spaces that can be used to generate high-resolution

Method: This thesis introduces a novel type of diffusion probabilistic model prior capable of generating discrete latent representations of high-resolution images by utilising bidirectional transformers. The
quality and diversity of images generated by these models are then evaluated and compared quantitatively and qualitatively to other similar models, before other interesting properties are also explored.

Results: The proposed approach achieves state-of-the-art results in terms of Density (LSUN Bedroom:
1.51; LSUN Churches: 1.12; FFHQ: 1.20) and Coverage (LSUN Bedroom: 0.83; LSUN Churches: 0.73;
FFHQ: 0.80), and performs competitively on FID (LSUN Bedroom: 3.64; LSUN Churches: 4.07; FFHQ:
6.11) whilst also offering significant advantages in terms of computation time.

Conclusions: Through the use of powerful bidirectional transformers and discretised latent spaces, it
is possible to train a discrete diffusion model to generate high-quality, high-resolution images in only
a fraction of the time required by continuous diffusion probabilistic models trained on the data space.
Not only are these models faster to train and sample from, they also only require a single NVIDIA
2080ti GPU with 11GB of RAM for successful training and achieve state-of-the-art results in terms of
generated image quality and diversity

Item Type:Thesis (Masters)
Award:Master of Science
Keywords:Deep Learning, Generative Models, Denoising Diffusion, Discrete Latent Spaces
Faculty and Department:Faculty of Science > Computer Science, Department of
Thesis Date:2023
Copyright:Copyright of this thesis is held by the author
Deposited On:25 May 2023 09:57

Social bookmarking: del.icio.usConnoteaBibSonomyCiteULikeFacebookTwitter