NATTRASS, CARL (2016) Powering the Academic Web. Doctoral thesis, Durham University.
| PDF - Accepted Version Available under License Creative Commons Public Domain Dedication CC0 1.0 Universal. 6Mb |
Abstract
Context: Locating resources on the Web has become increasingly difficult for users and poses a number of issues. The sheer size of the Web means that despite what appears to be an increase in the amount of quality material available, the effort involved in locating that material is also increasing; in effect, the higher quality material is being diluted by the lesser quality. One such group affected by this problem is post-graduate students. Having only a finite amount of time to devote to research, this reduces their overall quality study time.
Aim: This research investigates how post-graduate students use the Web as a learning resource and identifies a number of areas of concern with its use. It considers the potential for improvement in this matter by using a number of concepts such as: collaboration; peer reviewing and document classification and comparison techniques.
This research also investigates whether by combining several of the identified technologies and concepts, student research on the Web can be improved.
Method: Using some of the identified concepts as components, this research proposes a model to address the highlighted areas of concern. The proposed model, named the Durham Browsing Assistant (DurBA) is defined, and a number of key concepts which show potential within it are uncovered.
One of the key concepts is chosen, that of document comparison. Given a source document, can a computer system reliably identify other documents which most closely match it from other on the Web?
A software tool was created which allowed the testing of document comparison techniques, this was called the Durham Textual Comparison system (DurTeC) and it had two key concepts. The first was that it would allow various algorithms to be applied to the comparison process. The second concept was that it could simulate collaboration by allowing data to be altered, added and removed as if by multiple users.
A set of experiments were created to test these algorithms and identify those which gave the best results.
Results: The results from the experiments identified a number of the most promising relationships between comparison and collaboration processes. It also highlighted those which had a negative effect on the process, and those which produced variable results.
Amongst the results, it was found that:
1. By providing DurTeC with additional source documents to the original, as if through a recommendation process, it was able to increase its accuracy substantially.
2. By allowing DurTeC to use synonym lists to expand its vocabulary, in many cases, it was found to have reduced its accuracy.
3. By restricting those words which DurTeC considered in its comparison process, based upon their value in the source document, accuracy could be increased. This could be considered as a form of collaborative keyword selection.
Conclusion: This research shows that improvements can be made in the accuracy of identifying similar resources by using a combination of comparison and collaboration processes. The proposed model, DurBA would be an ideal host for such a system.
Item Type: | Thesis (Doctoral) |
---|---|
Award: | Doctor of Philosophy |
Keywords: | Web, Research, Collaboration, Document comparison, |
Faculty and Department: | Faculty of Science > Engineering and Computing Science, School of (2008-2017) |
Thesis Date: | 2016 |
Copyright: | Copyright of this thesis is held by the author |
Deposited On: | 21 Mar 2016 11:34 |