scholarly journals On Converting the Furthest-Pair-Based Binary Search Tree to a Decision Tree: Experimental Results on Big Data Classification

Author(s):  
Ahmad B. A. Hassanat

Big Data classification has recently received a great deal of attention due to the main properties of Big Data, which are volume, variety, and velocity. The furthest-pair-based binary search tree (FPBST) shows a great potential for Big Data classification. This work attempts to improve the performance the FPBST in terms of computation time, space consumed and accuracy. The major enhancement of the FPBST includes converting the resultant BST to a decision tree, in order to remove the need for the slow K-nearest neighbors (KNN), and to obtain a smaller tree, which is useful for memory usage, speeding both training and testing phases and increasing the classification accuracy. The proposed decision trees are based on calculating the probabilities of each class at each node using various methods; these probabilities are then used by the testing phase to classify an unseen example. The experimental results on some (small, intermediate and big) machine learning datasets show the efficiency of the proposed methods, in terms of space, speed and accuracy compared to the FPBST, which shows great potential for further enhancements of the proposed methods to be used in practice.

Information ◽  
2018 ◽  
Vol 9 (11) ◽  
pp. 284 ◽  
Author(s):  
Ahmad Hassanat

Big Data classification has recently received a great deal of attention due to the main properties of Big Data, which are volume, variety, and velocity. The furthest-pair-based binary search tree (FPBST) shows a great potential for Big Data classification. This work attempts to improve the performance the FPBST in terms of computation time, space consumed and accuracy. The major enhancement of the FPBST includes converting the resultant BST to a decision tree, in order to remove the need for the slow K-nearest neighbors (KNN), and to obtain a smaller tree, which is useful for memory usage, speeding both training and testing phases and increasing the classification accuracy. The proposed decision trees are based on calculating the probabilities of each class at each node using various methods; these probabilities are then used by the testing phase to classify an unseen example. The experimental results on some (small, intermediate and big) machine learning datasets show the efficiency of the proposed methods, in terms of space, speed and accuracy compared to the FPBST, which shows a great potential for further enhancements of the proposed methods to be used in practice.


Computers ◽  
2018 ◽  
Vol 7 (4) ◽  
pp. 54 ◽  
Author(s):  
Ahmad Hassanat

Due to their large sizes and/or dimensions, the classification of Big Data is a challenging task using traditional machine learning, particularly if it is carried out using the well-known K-nearest neighbors classifier (KNN) classifier, which is a slow and lazy classifier by its nature. In this paper, we propose a new approach to Big Data classification using the KNN classifier, which is based on inserting the training examples into a binary search tree to be used later for speeding up the searching process for test examples. For this purpose, we used two methods to sort the training examples. The first calculates the minimum/maximum scaled norm and rounds it to 0 or 1 for each example. Examples with 0-norms are sorted in the left-child of a node, and those with 1-norms are sorted in the right child of the same node; this process continues recursively until we obtain one example or a small number of examples with the same norm in a leaf node. The second proposed method inserts each example into the binary search tree based on its similarity to the examples of the minimum and maximum Euclidean norms. The experimental results of classifying several machine learning big datasets show that both methods are much faster than most of the state-of-the-art methods compared, with competing accuracy rates obtained by the second method, which shows great potential for further enhancements of both methods to be used in practice.


2017 ◽  
Vol 7 (1.5) ◽  
pp. 97
Author(s):  
T. Surekha ◽  
R. Siva Rama Prasad

The growth of the data is enormous in the current scenario of the developing information technology and performing the data classification is complex both in time and information extraction. Moreover, there are uncertainties in performing the big data classification that are associated with the unbalanced datasets. In order to overcome the issues, a novel method of big data classification is introduced in this paper. The novel method, Log Decision Tree and Map Reduce Framework (LDT-MRF) uses the Log Decision Tree (LDT) and the Map Reduce Framework (MRF) for performing the parallel data classification. The novel parameter termed as Log-entropy is used to select the best feature attribute for data classification. The data classification is performed using the LDT that enables the efficient data classification. Experimentation is carried out using three datasets, namely the Cleveland dataset, Switzerland dataset, and the Breast Cancer dataset. The comparative analysis is carried out using the performance metrics, such as sensitivity, specificity, and accuracy to prove the effectiveness of the proposed method. The sensitivity, specificity, and accuracy of the proposed method is 84.7596%, 74.633%, and 80.9088% respectively, which is greater when compared with the existing methods of big data classification. 


2000 ◽  
Vol 11 (03) ◽  
pp. 485-513 ◽  
Author(s):  
SEONGHUN CHO ◽  
SARTAJ SAHNI

We develop a new class of weight balanced binary search trees called β-balanced binary search trees (β-BBSTs). β-BBSTs are designed to have reduced internal path length. As a result, they are expected to exhibit good search time characteristics. Individual search, insert, and delete operations in an n node β-BBST take O( log n) time for [Formula: see text]. Experimental results comparing the performance of β-BBSTs, WB(α) trees, AVL-trees, red/black trees, treaps, deterministic skip lists and skip lists are presented. Two simplified versions of, β-BBSTs are also developed.


Sign in / Sign up

Export Citation Format

Share Document