ATAPOUR-ABARGHOUEI, AMIR (2019) Immaculate Depth Perception: Recovering 3D Scene Information via Depth Completion and Prediction. Doctoral thesis, Durham University.
|PDF - Accepted Version |
Available under License Creative Commons Attribution 3.0 (CC BY).
Even though obtaining three-dimensional (3D) information has received significant attention in scene capture systems in recent years, there are currently numerous challenges in scene depth estimation, which is one of the fundamental components of any 3D vision system focusing on RGB-D images. This has led to the creation of specific areas of research where the goal is to estimate complete scene depth or fill the missing 3D information post capture. In many downstream applications, incomplete scene depth is of limited value, and thus techniques are required to fill the holes that exist in terms of missing depth information. An analogous problem exists within the scope of scene filling post object removal in the same context. Although considerable research has resulted in notable progress in the synthetic expansion or reconstruction of missing colour scene information in both statistical and structural forms, work on the plausible completion of missing scene depth is contrastingly limited. In this thesis, we present various methods capable of performing the depth completion process required to achieve high quality scene depth post capture. Two novel methods capable of preforming object removal in an RGB-D image and, at the same time, filling the naturally-occurring holes within the depth image are proposed inspired by seminal approaches towards exemplar-based RGB image inpainting and texture synthesis. Another proposed approach takes advantage of the recent advances in semantic segmentation and a set of carefully designed hole cases to carry out object-wise depth completion in real time. Using the significant progress made in generative models, we then move on to a learning-based approach that utilizes a convolutional neural network trained on synthetic RGB-D images in a supervised framework using the Discrete Cosine Transform, adversarial training and domain adaptation to complete large missing portions of depth images. The representation learning capabilities of the network is evaluated by adapting the network to perform the somewhat similar task of monocular depth estimation, with outstanding results. Based on the success of the adapted monocular depth estimation model, we then propose two monocular depth estimation techniques, also trained on synthetic data, that can generate hole-free depth information from a single RGB image, circumnavigating the need for depth completion and refinement altogether. One of the approaches makes use of style transfer as a form of domain adaptation, and the other uses a recurrent model, a series of complex skip connections and adversarial training in a multi-task framework to generate temporally homogeneous depth outputs based on an input of a sequence of RGB images.
|Item Type:||Thesis (Doctoral)|
|Award:||Doctor of Philosophy|
|Keywords:||Computer Vision; 3D images; Monocular Depth Estimation; Depth Completion; Machine Learning, Deep Learning|
|Faculty and Department:||Faculty of Science > Computer Science, Department of|
|Copyright:||Copyright of this thesis is held by the author|
|Deposited On:||17 Oct 2019 12:45|