We use cookies to ensure that we give you the best experience on our website. By continuing to browse this repository, you give consent for essential cookies to be used. You can read more about our Privacy and Cookie Policy.

Durham e-Theses
You are in:

Video Person Re-identification for Future Automated Visual Surveillance Systems

ALSEHAIM, AISHAH,ABDULRAHMAN (2023) Video Person Re-identification for Future Automated Visual Surveillance Systems. Doctoral thesis, Durham University.



Person Re-identification across a collection of surveillance cameras is becoming an in- creasingly vital component of smart intelligent surveillance systems. Due to the numerous variations in human position, occlusion, viewpoint, illumination and background clut- ter most contemporary video Re-ID studies ( in order to extract spatio-temporal video features ) use complex CNN-based network architectures with 3D convolution or multi- branch networks. In this thesis, we intend to leverage the significant challenge given by person Re-ID by encoding person videos into a robust discriminative feature vector to im- prove performance under these challenging settings. The extraction of strong and discrim- inative features is a fundamental aspect of person Re-ID such that CNN-based approaches have dominated in this area. We show that a simple single-stream 2D convolution network using the ResNet50-IBN architecture to extract frame-level features can achieve superior performance when combined with temporal attention for clip-level features. By averag- ing, these features can be generalised to extract features from entire videos without added expense. While other recent work uses complicated and memory-intensive 3D convolu- tions or multi-stream networks architectures, our method uses both video Re-ID best prac- tice and transfer learning between datasets to achieve superior outcomes for person Re-ID. Moreover, we consider the task of joint person Re-ID and action recognition within the context of automated surveillance to learn discriminative feature representations that both improve Re-ID performance and are capable of providing viable per-view (clip-wise) ac- tion recognition. Weakly labelled actions from the leading two benchmark video Re-ID datasets (MARS, LPW) are used to perform a hybrid Re-ID and action recognition task utilising a mixture of two task-specific and multi-loss terms. Our multi-branch 2D CNN architecture achieves superior results to previous work in the field solely because we treat Re-ID and action detection as multi-task problem. Recently, vision transformer (ViT) ar- chitectures have been shown to boost fine-grained feature discrimination across a variety of vision tasks. To enable vision transformer (ViT) for video person Re-ID, two unique module constructions, Temporal Clip Shift and Shuffled (TCSS) and Video Patch Part Feature (VPPF), are proposed to enable ViT architectures to effectively meet the chal- lenges of video person Re-ID. Overall, we present three novel deep learning architectures that address the video person Re-ID task spanning the use of CNN, multi-task learning and ViT approaches.

Item Type:Thesis (Doctoral)
Award:Doctor of Philosophy
Faculty and Department:Faculty of Science > Computer Science, Department of
Thesis Date:2023
Copyright:Copyright of this thesis is held by the author
Deposited On:11 Aug 2023 12:50

Social bookmarking: del.icio.usConnoteaBibSonomyCiteULikeFacebookTwitter