Cookies

We use cookies to ensure that we give you the best experience on our website. By continuing to browse this repository, you give consent for essential cookies to be used. You can read more about our Privacy and Cookie Policy.


Durham e-Theses
You are in:

On Deep Machine Learning for Multi-view Object Detection and Neural Scene Rendering

ISAAC-MEDINA, BRIAN,KOSTADINOV,SHALON (2024) On Deep Machine Learning for Multi-view Object Detection and Neural Scene Rendering. Doctoral thesis, Durham University.

[img]
Preview
PDF - Accepted Version
Available under License Creative Commons Attribution Non-commercial 3.0 (CC BY-NC).

17Mb

Abstract

This thesis addresses two contemporary computer vision tasks using a set of multiple-view imagery, namely the joint use of multi-view images to improve object detection and neural scene rendering via a novel volumetric input encoding for Neural Radiance Fields (NeRF). While the former focuses on improving the accuracy of object detection, the latter contribution allows for better scene reconstruction, which ultimately can be exploited to generate novel views and perform multi-view object detection.
Notwithstanding the significant advances in automatic object detection in the last decade, multi-view object detection has received little attention. For this reason, two contributions regarding multi-view object detection in the absence of explicit camera pose information are presented in this thesis. First, a multi-view epipolar filtering technique is introduced, using the distance of the detected object centre to a corresponding epipolar line as an additional probabilistic confidence. This technique removes false positives without a corresponding detection in other views, giving greater confidence to consistent detections across the views. The second contribution adds an attention-based layer, called Multi-view Vision Transformer, to the backbone of a deep machine learning object detector, effectively aggregating features from different views and creating a multi-view aware representation.
The final contribution explores another application for multi-view imagery, namely novel volumetric input encoding of NeRF. The proposed method derives an analytical solution for the average value of a sinusoidal (inducing a high-frequency component) within a pyramidal frustum region, whereas previous state-of-the-art NeRF methods approximate this with a Gaussian distribution. This parameterisation obtains a better representation of regions where the Gaussian approximation is poor, allowing more accurate synthesis of distant areas and depth map estimation.
Experimental evaluation is carried out across multiple established benchmark datasets to compare the proposed methods against contemporary state-of-the-art architectures such that the efficacy of the proposed methods can be both quantitively and qualitatively illustrated.

Item Type:Thesis (Doctoral)
Award:Doctor of Philosophy
Keywords:Deep Learning,Machine Learning,Computer Vision,Object Detection,Multi-view,Neural Scene Rendering
Faculty and Department:Faculty of Science > Computer Science, Department of
Thesis Date:2024
Copyright:Copyright of this thesis is held by the author
Deposited On:08 Apr 2024 17:27

Social bookmarking: del.icio.usConnoteaBibSonomyCiteULikeFacebookTwitter