ALRASHEEDI, MASAD,AWDH,MOHAMMAD (2023) Optimal Thresholds for Classification Trees using Nonparametric Predictive Inference. Doctoral thesis, Durham University.
In data mining, classification is used to assign a new observation to one of a set of predefined classes based on the attributes of the observation. Classification trees are one of the most commonly used methods in the area of classification because their rules are easy to understand and interpret. Classification trees are constructed recursively by a top-down scheme using repeated splits of the training data set, which is a subset of the data. When the data set involves a continuous-valued attribute, there is a need to select an appropriate threshold value to determine the classes and split the data. In recent years, Nonparametric Predictive Inference (NPI) has been introduced for selecting optimal thresholds for two- and three-class classification problems, where the inferences are explicitly in terms of a given number of future observations and target proportions. These target proportions enable one to choose weights that reflect the relative importance of one class over another. The NPI-based threshold selection method has previously been implemented in the context of Receiver Operating Characteristic (ROC) analysis, but not for building classification trees. Due to the predictive nature of the NPI-based threshold selection method, it is well suited for the classification tree method, as the end goal of building classification trees is to use them for prediction as well. In this thesis, we present new classification algorithms for building classification trees using the NPI approach for selecting the optimal thresholds. We first present a new classification algorithm, which we call the NPI2-Tree algorithm, for building binary classification trees; we then extend it to build classification trees with three ordered classes, which we call the NPI3-Tree algorithm. In order to build classification trees using our algorithms, we introduce a new procedure for selecting the optimal values of target proportions by optimising classification performance on test data. We use different measures to evaluate and compare the performance of the NPI2-Tree and the NPI3-Tree classification algorithms with other classification algorithms from the literature. The experimental results show that our classification algorithms perform well compared to other algorithms. Finally, we present applications of the NPI2-Tree and NPI3-Tree classification algorithms on noisy data sets. Noise refers to situations that occur when the data sets used for classification tasks have incorrect values in the attribute variables or the class variable. The performances of the NPI2-Tree and NPI3-Tree classification algorithms in the case of noisy data are evaluated using different levels of noise added to the class variable. The results show that our classification algorithms perform well in case of noisy data and tend to be quite robust for most noise levels, compared to other classification algorithms.
|Item Type:||Thesis (Doctoral)|
|Award:||Doctor of Philosophy|
|Keywords:||Classification, Thresholds, Nonparametric Predictive Inference|
|Faculty and Department:||Faculty of Science > Mathematical Sciences, Department of|
|Copyright:||Copyright of this thesis is held by the author|
|Deposited On:||17 Jan 2023 13:06|