LI, RUOCHEN (2026) Spatial-Temporal Graph Representation Learning for Multi-Agent Trajectory Prediction. Doctoral thesis, Durham University.
| PDF - Accepted Version 21Mb |
Abstract
Trajectory prediction entails the forecasting of future movement trajectories of traffic agents derived from their historical observed behaviours. This sophisticated technique is essential for various real-world applications such as path planning and collision avoidance for autonomous driving systems, and anomaly detection within video surveillance technologies.
However, trajectory prediction for multi-agent scenarios presents significant challenges due to the complex interaction dynamics across diverse traffic environments. These environments can range from homogeneous settings dominated by similar agents (e.g., pedestrians in crowds) to heterogeneous scenes with mixed agent types (e.g., pedestrians, vehicles, cyclists, etc.). To tackle these challenges, an integrated understanding of agent behaviours across diverse contexts is essential. Agents continuously adjust their movements based on surrounding entities, creating complex interaction patterns that vary between homogeneous pedestrian crowds and heterogeneous traffic scenarios. Capturing these nuanced spatial–temporal inter-dependencies demands sophisticated models that represent both individual and collective dynamics while accommodating distinct agent behaviours. The primary aim of this research is to develop robust and accurate trajectory prediction frameworks capable of bridging this gap and operating effectively across both homogeneous and heterogeneous contexts. To achieve this aim, this dissertation pursues three core objectives: (1) Analyzing dynamics and spatial–temporal interactions in homogeneous pedestrian crowds. (2) Understanding interaction patterns for heterogeneous traffic environments with diverse agent types. (3) Developing a unified framework that integrates insights from both heterogeneous and homogeneous settings for improved and robust trajectory prediction.
The motivation of this research stems from the limitations of existing trajectory prediction methods across different settings. In homogeneous pedestrian scenarios, highly interactive and collective behaviours pose challenges for modelling high-order spatial–temporal dependencies. In heterogeneous environments, diverse agent types such as pedestrians, cyclists, and vehicles exhibit asymmetric dynamics that remain difficult to capture with current approaches. Moreover, most methods treat these contexts in isolation, lacking robustness and generalization in real-world environments. Addressing these gaps calls for unified graph-based frameworks that can integrate insights from both domains while advancing spatial–temporal modelling to represent complex interactions and long-range dependencies more effectively. This research introduces a series of novel frameworks designed to enhance the robustness and
accuracy of trajectory prediction under different settings. We begin by addressing the challenges of homogeneous pedestrian trajectory prediction, where the highly interactive nature of pedestrians and their collective behaviours demand precise modelling of spatial–temporal relationships. To this end, we propose UniEdge, a dual-graph–inspired unified spatial–temporal edge-enhanced graph network that effectively captures both high-order cross-time interactions and complex influence patterns between pedestrians, providing more accurate and socially aware predictions in homogeneous settings. We then extend our investigation to heterogeneous environments featuring multiple interacting agent types. For this purpose, we propose Multiclass-SGCN, a sparse graph-based trajectory prediction network with agent class embedding that models the unique dynamics among heterogeneous agents such as pedestrians, vehicles, and cyclists. By integrating semantic agent-class information with motion features, Multiclass-SGCN explicitly represents cross-type interaction dynamics while maintaining computational efficiency. Building on the insights gained from both homogeneous and heterogeneous contexts, and recognizing the need for a more broadly applicable solution, we propose a behavioural pseudo-label informed sparse graph convolution network (BP-SGCN) for trajectory prediction across both settings. It introduces the novel concept of behavioural pseudo-labels to represent different movement patterns of traffic agents without requiring additional annotations. Through a cascaded training scheme that optimizes clustering and trajectory prediction in tandem, BP-SGCN effectively captures both inter-class and intra-class behavioural variations, offering a robust, unified framework for trajectory prediction across diverse environments. Our extensive experimental evaluations and qualitative analyses across multiple benchmark datasets consistently demonstrate that the proposed frameworks outperform state-of-the-art methods in trajectory prediction, validating the effectiveness of our progressive research approach from homogeneous to heterogeneous to unified prediction systems.
| Item Type: | Thesis (Doctoral) |
|---|---|
| Award: | Doctor of Philosophy |
| Faculty and Department: | Faculty of Science > Computer Science, Department of |
| Thesis Date: | 2026 |
| Copyright: | Copyright of this thesis is held by the author |
| Deposited On: | 19 Jan 2026 09:45 |



