SLACK, DEAN,LEWIS (2025) On Hierarchical Encoding and Reasoning in Deep Transformer-based Generative Models. Doctoral thesis, Durham University.
| PDF - Accepted Version 7Mb |
Abstract
Recent advances in generative Transformer-based foundation models have driven remarkable progress in artificial intelligence, yet their internal mechanisms for representing complex hierarchical structures remain largely unknown, posing significant challenges for interpretability, safety, and robust generalisation. This thesis aims to progress on these issues by systematically investigating how such models internalise hierarchical structures, the relationship between this learning and behaviours like generalisation versus memorisation, and how hierarchical principles can inform the development of safer, more accurate, generative models. To this end, we first introduce novel probing techniques to map the layer-wise emergence of linguistic hierarchies in language models and extend this analysis to the visual domain by developing PSViT: a pixel-space Transformer with hierarchical decompositions of video image patches, shown to learn and generalise hierarchical physical dynamics from raw video data. We investigate memorisation during fine-tuning, establishing an n-gram based early warning signal for verbatim leakage and proposing scalable defences that promote structural generalisation over verbatim memorisation. Building on these insights, we further demonstrate that a unified next-frame prediction framework enables a single model to process text, images, audio, and video without modality-specific encoders, thereby learning shared hierarchical patterns across these diverse inputs. Collectively, our findings underscore that the capacity to learn and represent hierarchical structure is a fundamental characteristic of Transformer models, and that a focused analysis of these underpinnings is crucial for advancing more capable, interpretable, and safer artificial intelligence.
| Item Type: | Thesis (Doctoral) |
|---|---|
| Award: | Doctor of Philosophy |
| Keywords: | Deep Learning, Machine Learning, Spatiotemporal Modelling, Hierarchical Reasoning, Natural Language Processing |
| Faculty and Department: | Faculty of Science > Computer Science, Department of |
| Thesis Date: | 2025 |
| Copyright: | Copyright of this thesis is held by the author |
| Deposited On: | 04 Nov 2025 11:43 |



