On Converting the Furthest-Pair-Based Binary Search Tree to a Decision Tree: Experimental Results on Big Data Classification

Mapping Intimacies ◽

10.20944/preprints201810.0595.v1 ◽

2018 ◽

Author(s):

Ahmad B. A. Hassanat

Keyword(s):

Big Data ◽

Decision Tree ◽

Computation Time ◽

Data Classification ◽

Search Tree ◽

Binary Search ◽

Experimental Results ◽

Binary Search Tree ◽

Time Space ◽

Big Data Classification

Big Data classification has recently received a great deal of attention due to the main properties of Big Data, which are volume, variety, and velocity. The furthest-pair-based binary search tree (FPBST) shows a great potential for Big Data classification. This work attempts to improve the performance the FPBST in terms of computation time, space consumed and accuracy. The major enhancement of the FPBST includes converting the resultant BST to a decision tree, in order to remove the need for the slow K-nearest neighbors (KNN), and to obtain a smaller tree, which is useful for memory usage, speeding both training and testing phases and increasing the classification accuracy. The proposed decision trees are based on calculating the probabilities of each class at each node using various methods; these probabilities are then used by the testing phase to classify an unseen example. The experimental results on some (small, intermediate and big) machine learning datasets show the efficiency of the proposed methods, in terms of space, speed and accuracy compared to the FPBST, which shows great potential for further enhancements of the proposed methods to be used in practice.

Download Full-text

Furthest-Pair-Based Decision Trees: Experimental Results on Big Data Classification

Information ◽

10.3390/info9110284 ◽

2018 ◽

Vol 9 (11) ◽

pp. 284 ◽

Cited By ~ 5

Author(s):

Ahmad Hassanat

Keyword(s):

Big Data ◽

Decision Trees ◽

Computation Time ◽

Data Classification ◽

Search Tree ◽

Experimental Results ◽

Binary Search Tree ◽

K Nearest Neighbors ◽

Time Space ◽

Big Data Classification

Big Data classification has recently received a great deal of attention due to the main properties of Big Data, which are volume, variety, and velocity. The furthest-pair-based binary search tree (FPBST) shows a great potential for Big Data classification. This work attempts to improve the performance the FPBST in terms of computation time, space consumed and accuracy. The major enhancement of the FPBST includes converting the resultant BST to a decision tree, in order to remove the need for the slow K-nearest neighbors (KNN), and to obtain a smaller tree, which is useful for memory usage, speeding both training and testing phases and increasing the classification accuracy. The proposed decision trees are based on calculating the probabilities of each class at each node using various methods; these probabilities are then used by the testing phase to classify an unseen example. The experimental results on some (small, intermediate and big) machine learning datasets show the efficiency of the proposed methods, in terms of space, speed and accuracy compared to the FPBST, which shows a great potential for further enhancements of the proposed methods to be used in practice.

Download Full-text

Furthest-Pair-Based Binary Search Tree for Speeding Big Data Classification Using K-Nearest Neighbors

Big Data ◽

10.1089/big.2018.0064 ◽

2018 ◽

Vol 6 (3) ◽

pp. 225-235 ◽

Cited By ~ 10

Author(s):

Ahmad B.A. Hassanat

Keyword(s):

Big Data ◽

Data Classification ◽

Nearest Neighbors ◽

Search Tree ◽

Binary Search ◽

Binary Search Tree ◽

K Nearest Neighbors ◽

Big Data Classification

Download Full-text

Norm-Based Binary Search Trees for Speeding Up KNN Big Data Classification

Computers ◽

10.3390/computers7040054 ◽

2018 ◽

Vol 7 (4) ◽

pp. 54 ◽

Cited By ~ 11

Author(s):

Ahmad Hassanat

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Classification ◽

Search Tree ◽

Binary Search ◽

Binary Search Tree ◽

K Nearest Neighbors ◽

Knn Classifier ◽

Big Data Classification ◽

Training Examples

Due to their large sizes and/or dimensions, the classification of Big Data is a challenging task using traditional machine learning, particularly if it is carried out using the well-known K-nearest neighbors classifier (KNN) classifier, which is a slow and lazy classifier by its nature. In this paper, we propose a new approach to Big Data classification using the KNN classifier, which is based on inserting the training examples into a binary search tree to be used later for speeding up the searching process for test examples. For this purpose, we used two methods to sort the training examples. The first calculates the minimum/maximum scaled norm and rounds it to 0 or 1 for each example. Examples with 0-norms are sorted in the left-child of a node, and those with 1-norms are sorted in the right child of the same node; this process continues recursively until we obtain one example or a small number of examples with the same norm in a leaf node. The second proposed method inserts each example into the binary search tree based on its similarity to the examples of the minimum and maximum Euclidean norms. The experimental results of classifying several machine learning big datasets show that both methods are much faster than most of the state-of-the-art methods compared, with competing accuracy rates obtained by the second method, which shows great potential for further enhancements of both methods to be used in practice.

Download Full-text

Selecting a representative decision tree from an ensemble of decision-tree models for fast big data classification

Journal Of Big Data ◽

10.1186/s40537-019-0186-3 ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 4

Author(s):

Abraham Itzhak Weinberg ◽

Mark Last

Keyword(s):

Big Data ◽

Decision Tree ◽

Data Classification ◽

Tree Models ◽

Big Data Classification

Download Full-text

Research on online evaluation method of MOOC teaching quality based on decision tree-based big data classification

International Journal of Continuing Engineering Education and Life-Long Learning ◽

10.1504/ijceell.2023.10037599 ◽

2023 ◽

Vol 1 (1) ◽

pp. 1

Author(s):

Humin Yang ◽

Jiefeng Wang

Keyword(s):

Big Data ◽

Decision Tree ◽

Evaluation Method ◽

Data Classification ◽

Teaching Quality ◽

Online Evaluation ◽

Big Data Classification

Download Full-text

A Novel Hybrid Technique for Big Data Classification Using Decision Tree Learning

Communications in Computer and Information Science - Computational Intelligence, Communications, and Business Analytics ◽

10.1007/978-981-10-6427-2_10 ◽

2017 ◽

pp. 118-128 ◽

Cited By ~ 1

Author(s):

Khyati Ahlawat ◽

Amit Prakash Singh

Keyword(s):

Big Data ◽

Decision Tree ◽

Data Classification ◽

Hybrid Technique ◽

Decision Tree Learning ◽

Big Data Classification

Download Full-text

LDT-MRF: Log decision tree and map reduce framework to clinical big data classification

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i1.5.9129 ◽

2017 ◽

Vol 7 (1.5) ◽

pp. 97

Author(s):

T. Surekha ◽

R. Siva Rama Prasad

Keyword(s):

Big Data ◽

Decision Tree ◽

Performance Metrics ◽

Data Classification ◽

Map Reduce ◽

Breast Cancer Dataset ◽

The Novel ◽

Novel Method ◽

Big Data Classification ◽

Sensitivity Specificity

The growth of the data is enormous in the current scenario of the developing information technology and performing the data classification is complex both in time and information extraction. Moreover, there are uncertainties in performing the big data classification that are associated with the unbalanced datasets. In order to overcome the issues, a novel method of big data classification is introduced in this paper. The novel method, Log Decision Tree and Map Reduce Framework (LDT-MRF) uses the Log Decision Tree (LDT) and the Map Reduce Framework (MRF) for performing the parallel data classification. The novel parameter termed as Log-entropy is used to select the best feature attribute for data classification. The data classification is performed using the LDT that enables the efficient data classification. Experimentation is carried out using three datasets, namely the Cleveland dataset, Switzerland dataset, and the Breast Cancer dataset. The comparative analysis is carried out using the performance metrics, such as sensitivity, specificity, and accuracy to prove the effectiveness of the proposed method. The sensitivity, specificity, and accuracy of the proposed method is 84.7596%, 74.633%, and 80.9088% respectively, which is greater when compared with the existing methods of big data classification.

Download Full-text

Optimized Decision tree rules using divergence based grey wolf optimization for big data classification in health care

Evolutionary Intelligence ◽

10.1007/s12065-019-00267-w ◽

2019 ◽

Cited By ~ 2

Author(s):

Pravin S. Game ◽

Vinod Vaze ◽

M. Emmanuel

Keyword(s):

Health Care ◽

Big Data ◽

Decision Tree ◽

Data Classification ◽

Grey Wolf ◽

Grey Wolf Optimization ◽

Big Data Classification

Download Full-text

A NEW WEIGHT BALANCED BINARY SEARCH TREE

International Journal of Foundations of Computer Science ◽

10.1142/s0129054100000296 ◽

2000 ◽

Vol 11 (03) ◽

pp. 485-513 ◽

Cited By ~ 1

Author(s):

SEONGHUN CHO ◽

SARTAJ SAHNI

Keyword(s):

Path Length ◽

Search Time ◽

Search Tree ◽

Binary Search ◽

Experimental Results ◽

Binary Search Tree ◽

Binary Search Trees ◽

Search Trees ◽

New Class ◽

Avl Trees

We develop a new class of weight balanced binary search trees called β-balanced binary search trees (β-BBSTs). β-BBSTs are designed to have reduced internal path length. As a result, they are expected to exhibit good search time characteristics. Individual search, insert, and delete operations in an n node β-BBST take O( log n) time for [Formula: see text]. Experimental results comparing the performance of β-BBSTs, WB(α) trees, AVL-trees, red/black trees, treaps, deterministic skip lists and skip lists are presented. Two simplified versions of, β-BBSTs are also developed.

Download Full-text

Two-point-based binary search trees for accelerating big data classification using KNN

PLoS ONE ◽

10.1371/journal.pone.0207772 ◽

2018 ◽

Vol 13 (11) ◽

pp. e0207772 ◽

Cited By ~ 6

Author(s):

Ahmad B. A. Hassanat

Keyword(s):

Big Data ◽

Data Classification ◽

Binary Search ◽

Binary Search Trees ◽

Search Trees ◽

Big Data Classification

Download Full-text