HODGSON, RYAN,THOMAS (2024) Finding Meaning Through Downstream Analysis of Embeddings:
A Case-Study of Knowledge Discovery for the Media and Publishing Industry. Doctoral thesis, Durham University.
| PDF - Accepted Version 13Mb |
Abstract
In recent years, the publishing sector has undergone a significant transformation in the way news and journalistic material is consumed, shifting from traditional print formats to digital platforms. This shift in the consumption of journalistic content poses both opportunities and challenges for the industry. Although traditional print publications relied on a combination of sales and advertising revenue, the expectation of free access to online media has impacted the profitability of these organisations. Consequently, publishing houses and newspapers have significantly reduced budgets, including those allocated to journalists. This reduction in journalistic staff has limited the time and resources available to produce high-quality content.
To address this issue, the research presented in this thesis explores methods to address the time-costly nature of many of the tasks that journalists and publishers perform, in order to contribute innovative tools to streamline many manual processes. Conducted in conjunction with an industry sponsor, Distinctive Publishing, this research contributes to the publishing domain through a focus on the leveraging of unsupervised learning techniques, to enable the expediting of common processes, which would often be performed manually.
The structure of this research project can be summarised in four main aspects. 1. The proposal of meta-embedding based semantic similarity searches, to enhance the quality of semantic searching of databases of social media influencers based on full-text queries. 2. An exploration of the feasibility of topic modelling algorithms, for the identification of academic topics from large volumes of literature. 3. Based on the outcomes of the exploration of topic modelling, an end-to-end framework for assisting literature analysis is proposed, and evaluated in an experimental setting. It indicates the breadth and generalisabilty of our findings, and the value that automating the analysis of literature can have for researchers, beyond journalists and media stakeholders. 4. The task of parametric dimensionality reduction using attention mechanisms in neural network encoders is proposed as a method to improve cluster-based topic models, by enhancing the dimensionality reduction processes required in such algorithms.
The findings presented in this thesis are initially based on applied research, by adapting existing algorithms to the specific domain of the media and publishing industry. However, during the research, key aspects of the algorithmic processes were identified, specifically in relation to the dimensionality reduction process necessary for cluster-based topic modelling algorithms. Based on this, a novel paradigm of research is presented and explored, through the consideration of architectural design in parametric dimensionality reduction as an effective method to improve the quality of topic modelling. The consequences of this discovery introduce new areas for future investigation, which are proposed for further exploration in ongoing research.
Item Type: | Thesis (Doctoral) |
---|---|
Award: | Doctor of Philosophy |
Keywords: | Topic Modelling; Clustering; Text Embedding; Semantic Search; Dimensionality Reduction; Transformer; Publishing; Journalism; Media |
Faculty and Department: | Faculty of Science > Computer Science, Department of |
Thesis Date: | 2024 |
Copyright: | Copyright of this thesis is held by the author |
Deposited On: | 18 Oct 2024 10:56 |