hoeffding tree
Recently Published Documents


TOTAL DOCUMENTS

38
(FIVE YEARS 23)

H-INDEX

5
(FIVE YEARS 1)

2022 ◽  
Vol 18 (1) ◽  
pp. 1-17
Author(s):  
Sarah Nait Bahloul ◽  
Oussama Abderrahim ◽  
Aya Ichrak Benhadj Amar ◽  
Mohammed Yacine Bouhedadja

The classification of data streams has become a significant and active research area. The principal characteristics of data streams are a large amount of arrival data, the high speed and rate of its arrival, and the change of their nature and distribution over time. Hoeffding Tree is a method to, incrementally, build decision trees. Since its proposition in the literature, it has become one of the most popular tools of data stream classification. Several improvements have since emerged. Hoeffding Anytime Tree was recently introduced and is considered one of the most promising algorithms. It offers a higher accuracy compared to the Hoeffding Tree in most scenarios, at a small additional computational cost. In this work, the authors contribute by proposing three improvements to the Hoeffding Anytime Tree. The improvements are tested on known benchmark datasets. The experimental results show that two of the proposed variants make better usage of Hoeffding Anytime Tree’s properties. They learn faster while providing the same desired accuracy.


Sensors ◽  
2021 ◽  
Vol 21 (24) ◽  
pp. 8289
Author(s):  
Shilan S. Hameed ◽  
Ali Selamat ◽  
Liza Abdul Latiff ◽  
Shukor A. Razak ◽  
Ondrej Krejcar ◽  
...  

Cyber-attack detection via on-gadget embedded models and cloud systems are widely used for the Internet of Medical Things (IoMT). The former has a limited computation ability, whereas the latter has a long detection time. Fog-based attack detection is alternatively used to overcome these problems. However, the current fog-based systems cannot handle the ever-increasing IoMT’s big data. Moreover, they are not lightweight and are designed for network attack detection only. In this work, a hybrid (for host and network) lightweight system is proposed for early attack detection in the IoMT fog. In an adaptive online setting, six different incremental classifiers were implemented, namely a novel Weighted Hoeffding Tree Ensemble (WHTE), Incremental K-Nearest Neighbors (IKNN), Incremental Naïve Bayes (INB), Hoeffding Tree Majority Class (HTMC), Hoeffding Tree Naïve Bayes (HTNB), and Hoeffding Tree Naïve Bayes Adaptive (HTNBA). The system was benchmarked with seven heterogeneous sensors and a NetFlow data infected with nine types of recent attack. The results showed that the proposed system worked well on the lightweight fog devices with ~100% accuracy, a low detection time, and a low memory usage of less than 6 MiB. The single-criteria comparative analysis showed that the WHTE ensemble was more accurate and was less sensitive to the concept drift.


2021 ◽  
pp. 2748-2758
Author(s):  
Rasha Hani Salman ◽  
Nadia Adnan Shiltagh ◽  
Mahmood Zaki Abdullah

     Governmental establishments are maintaining historical data for job applicants for future analysis of predication, improvement of benefits, profits, and development of organizations and institutions. In e-government, a decision can be made about job seekers after mining in their information that will lead to a beneficial insight. This paper proposes the development and implementation of an applicant's appropriate job prediction system to suit his or her skills using web content classification algorithms (Logit Boost, j48, PART, Hoeffding Tree, Naive Bayes). Furthermore, the results of the classification algorithms are compared based on data sets called "job classification data" sets. Experimental results indicated that the algorithm j48 had the highest precision (94.80%) compared to other algorithms for the aforementioned dataset.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Renuka Devi D. ◽  
Sasikala S.

Purpose The purpose of this paper is to enhance the accuracy of classification of streaming big data sets with lesser processing time. This kind of social analytics would contribute to society with inferred decisions at a correct time. The work is intended for streaming nature of Twitter data sets. Design/methodology/approach It is a demanding task to analyse the increasing Twitter data by the conventional methods. The MapReduce (MR) is used for quickest analytics. The online feature selection (OFS) accelerated bat algorithm (ABA) and ensemble incremental deep multiple layer perceptron (EIDMLP) classifier is proposed for Feature Selection and classification. Three Twitter data sets under varied categories are investigated (product, service and emotions). The proposed model is compared with Particle Swarm Optimization, Accelerated Particle Swarm Optimization, accelerated simulated annealing and mutation operator (ASAMO). Feature Selection algorithms and classifiers such as Naïve Bayes, support vector machine, Hoeffding tree and fuzzy minimal consistent class subset coverage with the k-nearest neighbour (FMCCSC-KNN). Findings The proposed model is compared with PSO, APSO, ASAMO. Feature Selection algorithms, and classifiers such as Naïve Bayes (NB), support vector machine (SVM), Hoeffding Tree (HT), and Fuzzy Minimal Consistent Class Subset Coverage with the K-Nearest Neighbour (FMCCSC-KNN). The outcome of the work has achieved an accuracy of 99%, 99.48%, 98.9% for the given data sets with the processing time of 0.0034, 0.0024, 0.0053, seconds respectively. Originality/value A novel framework is proposed for Feature Selection and classification. The work is compared with the authors’ previously developed classifiers with other state-of-the-art Feature Selection and classification algorithms.


2021 ◽  
Vol 25 (1) ◽  
pp. 81-104
Author(s):  
Eva García-Martín ◽  
Albert Bifet ◽  
Niklas Lavesson

Energy consumption reduction has been an increasing trend in machine learning over the past few years due to its socio-ecological importance. In new challenging areas such as edge computing, energy consumption and predictive accuracy are key variables during algorithm design and implementation. State-of-the-art ensemble stream mining algorithms are able to create highly accurate predictions at a substantial energy cost. This paper introduces the nmin adaptation method to ensembles of Hoeffding tree algorithms, to further reduce their energy consumption without sacrificing accuracy. We also present extensive theoretical energy models of such algorithms, detailing their energy patterns and how nmin adaptation affects their energy consumption. We have evaluated the energy efficiency and accuracy of the nmin adaptation method on five different ensembles of Hoeffding trees under 11 publicly available datasets. The results show that we are able to reduce the energy consumption significantly, by 21% on average, affecting accuracy by less than one percent on average.


2021 ◽  
Vol 18 (6) ◽  
pp. 8024-8044
Author(s):  
B. Ida Seraphim ◽  
◽  
E. Poovammal ◽  
Kadiyala Ramana ◽  
Natalia Kryvinska ◽  
...  

<abstract> <p>Cybersecurity experts estimate that cyber-attack damage cost will rise tremendously. The massive utilization of the web raises stress over how to pass on electronic information safely. Usually, intruders try different attacks for getting sensitive information. An Intrusion Detection System (IDS) plays a crucial role in identifying the data and user deviations in an organization. In this paper, stream data mining is incorporated with an IDS to do a specific task. The task is to distinguish the important, covered up information successfully in less amount of time. The experiment focuses on improving the effectiveness of an IDS using the proposed Stacked Autoencoder Hoeffding Tree approach (SAE-HT) using Darwinian Particle Swarm Optimization (DPSO) for feature selection. The experiment is performed in NSL_KDD dataset the important features are obtained using DPSO and the classification is performed using proposed SAE-HT technique. The proposed technique achieves a higher accuracy of 97.7% when compared with all the other state-of-art techniques. It is observed that the proposed technique increases the accuracy and detection rate thus reducing the false alarm rate.</p> </abstract>


2020 ◽  
Vol 9 (6) ◽  
pp. 2518-2525
Author(s):  
Eddie Bouy B. Palad ◽  
Mary Jane F. Burden ◽  
Christian Ray Dela Torre ◽  
Rachelle Bea C. Uy

Text mining is one way of extracting knowledge and finding out hidden relationships among data using artificial intelligence methods. Surely, taking advantage of different techniques has been highlighted in previous researches however, the lack of literature focusing on cybercrimes implies the lack of utilization of data mining in facilitating cybercrime investigations in the Philippines. This study therefore classifies computer fraud or online scam data coming from Police incident reports as well as narratives of scam victims as a continuation of a prior study. The dataset consists mainly of unstructured data of 49,822 mainly Filipino words. Further, five (5) decision tree algorithms namely, J48, Hoeffding Tree, Decision Stump, REPTree, and Random Forest were employed and compared in terms of their performance and prediction accuracy. The results show that J48 achieves the highest accuracy and the lowest error rate among other classifiers. Results were validated by Police investigators where J48 was likewise preferred as a potential tool to apply in cybercrime investigations. This indicates the importance of text mining in the field of cybercrime investigation domains in the country. Further work can be carried out in the future using different and more inclusive cybercrime datasets and other classification techniques in Weka or any other data mining tool.


Sign in / Sign up

Export Citation Format

Share Document