JACKSON, PHILIP,THOMAS,GABRIEL (2020) Machine Learning Advances for Practical Problems in Computer Vision. Doctoral thesis, Durham University.
Convolutional neural networks (CNN) have become the de facto standard for computer vision tasks, due to their unparalleled performance and versatility. Although deep learning removes the need for extensive hand engineered features for every task, real world applications of CNNs still often require considerable engineering effort to produce usable results. In this thesis, we explore solutions to problems that arise in practical applications of CNNs.
We address a rarely acknowledged weakness of CNN object detectors: the tendency to emit many excess detection boxes per object, which must be pruned by non maximum suppression (NMS). This practice relies on the assumption that highly overlapping boxes are excess, which is problematic when objects are occluding overlapping detections are actually required. Therefore we propose a novel loss function that incentivises a CNN to emit exactly one detection per object, making NMS unnecessary.
Another common problem when deploying a CNN in the real world is domain shift - CNNs can be surprisingly vulnerable to sometimes quite subtle differences between the images they encounter at deployment and those they are trained on. We investigate the role that texture plays in domain shift, and propose a novel data augmentation technique using style transfer to train CNNs that are more robust against shifts in texture. We demonstrate that this technique results in better domain transfer on several datasets, without requiring any domain specific knowledge.
In collaboration with AstraZeneca, we develop an embedding space for cellular images collected in a high throughput imaging screen as part of a drug discovery project. This uses a combination of techniques to embed the images in 2D space such that similar images are nearby, for the purpose of visualization and data exploration. The images are also clustered automatically, splitting the large dataset into a smaller number of clusters that display a common phenotype. This allows biologists to quickly triage the high throughput screen, selecting a small subset of promising phenotypes for further investigation.
Finally, we investigate an unusual form of domain bias that manifested in a real-world visual binary classification project for counterfeit detection. We confirm that CNNs are able to ``cheat'' the task by exploiting a strong correlation between class label and the specific camera that acquired the image, and show that this reliably occurs when the correlation is present. We also investigate the question of how exactly the CNN is able to infer camera type from image pixels, given that this is impossible to the human eye.
The contributions in this thesis are of practical value to deep learning practitioners working on a variety of problems in the field of computer vision.
|Item Type:||Thesis (Doctoral)|
|Award:||Doctor of Philosophy|
|Keywords:||Deep Learning, Convolutional Neural Networks, Domain Shift, Non Maximum Suppression, Data Augmentation, Data Exploration|
|Faculty and Department:||Faculty of Science > Computer Science, Department of|
|Copyright:||Copyright of this thesis is held by the author|
|Deposited On:||09 Oct 2020 15:11|