Cookies

We use cookies to ensure that we give you the best experience on our website. By continuing to browse this repository, you give consent for essential cookies to be used. You can read more about our Privacy and Cookie Policy.


Durham e-Theses
You are in:

Distance correlation for blind source
separation: A study of machine learning
techniques applied to synthetic and
geodetic data

CALLANDER, ELIZABETH (2025) Distance correlation for blind source
separation: A study of machine learning
techniques applied to synthetic and
geodetic data.
Doctoral thesis, Durham University.

[img]
Preview
PDF
58Mb

Abstract

According to the WHO, one hundred twenty-five million people were affected by earth
quakes between 1998 and 2017. Increasing our knowledge of the Earthquake cycle is an important task, and using machine learning techniques for the prediction of earth
quakes is a promising research direction.
In recent years, the number of Global Navigation Satellite System (GNSS) receiver stations has significantly increased, providing daily data on their locations. Geodetic
processes and errors associated with measuring the distance between satellites and receiver stations influence the apparent location of these receiver stations. The work in this thesis uses data that include key geodetic signals and underlying components representing non-geological activity, such as atmospheric components. The separation of these
components is the core inspiration for my work, which I address using the blind source separation (BSS) technique to isolate seismic events from atmospheric and instrumental
noise in geodetic time and spatial series (GNSS and SAR, respectively) for earthquake monitoring and post-seismic analysis.
In source separation techniques, it is common to assume that the underlying sources are independent. One challenge identified in this context is the difficulty in selecting an appropriate metric to quantify the dependence between sources while effectively optimising toward extrema to produce the most independent sources. To tackle this issue, I compare various independence metrics using the non-parametric test of Binary Phase Shift Keying over an additive white Gaussian noise (AWGN) channel, which serves as a well-established test in Communication Theory. Furthermore, I present an example that compares a binary signal to the average of other binary signals while gradually increasing the number of signals included in this average.
Then, I examine the suitability of these metrics as loss functions, particularly concerning their optimisation and the tailored algorithms required to compute challenging
extrema. My research is comprehensive. I apply architectures and metrics to various benchmark datasets for widely adopted source-separation tasks; extend them to GNSS and SAR data to provide geological context and explore representation learning. This multifaceted approach validates my methods on both labelled (for supervised learning) and unlabelled (for unsupervised learning) data, providing a robust foundation for my findings.
In this work, I introduce distance correlation as a metric for assessing signal independence and evaluate it on several distinct scenarios:
1. Communication-Theory Benchmark: Binary Phase Shift Keying (BPSK) signals transmitted over additive white Gaussian noise (AWGN) channels were used to compare distance correlation to a closed-form mutual information statistic.
2. Synthetic and Hybrid Synthetic/Geodetic Mixtures: The datasets for this task included combinations of three source signals formed by the linear mixing of sine, square and sawtooth waves; mixtures involving GNSS station pairs combined with synthetic seismic deformation signals or SAR data combined with additive signals.
These datasets were used to evaluate various BSS methods, including comparisons with the popular FastICA algorithm.
3. Geodetic Data: Real GNSS time series collected around a known seismic event
were used to investigate the separation of underlying geophysical sources.
4. Representation Learning: Modelling techniques aimed at extracting semantically meaningful features from datasets, including image-based classification across ten
categories for CIFAR-10, and disentangled latent features from binary pedestrian mask sequences in the KITTI-Masks dataset. For the first experiment, using the synthetic dataset, I extracted three waves from the input mixtures, such that the neural network was optimised to extract the most independent underlying sources. On average, the distance correlation method outperformed the established gold standard FastICA, a blind source separation technique based on non
Gaussianity. I also applied this method to a dataset created by combining two signals from similar GNSS stations, considered to be one source, to a known synthetic signal representing an earthquake with post-seismic deformation at different epicentres. In this case, FastICA slightly outperformed distance correlation in separating the synthetic, seismic signal. When extracting a real seismic event from two actual GNSS stations, FastICA again
outperformed distance correlation. It is important to note that the seismic signal in this scenario was compared against the decomposed trend of the GNSS stations and an element of afterslip, not a known ground truth. As such, this comparison should be regarded with caution.
In my final analysis, I applied distance correlation to more advanced representation learning tasks. For the CIFAR-10 dataset, I used a whitening technique for scattering
and then brought positive pairs closer together using distance correlation. This approach
achieved a Top 1 accuracy of 88.8%. However, it underperformed compared to the original W-MSE method, which achieved a Top 1 accuracy of 91.2%.
The previously mentioned whitening representation methods did not yield good results for the disentanglement task involving the KITTI-Masks dataset. However, when I
updated the InfoNCE loss (Laplace, Unbounded) for double-centred inputs, as a proxy of distance correlation, I improved the state-of-the-art mean correlation coefficient (MCC) score by 0.6%.

Item Type:Thesis (Doctoral)
Award:Doctor of Philosophy
Keywords:Blind Source Separation, GNSS
Faculty and Department:Faculty of Science > Computer Science, Department of
Thesis Date:2025
Copyright:Copyright of this thesis is held by the author
Deposited On:29 Oct 2025 08:51

Social bookmarking: del.icio.usConnoteaBibSonomyCiteULikeFacebookTwitter