scholarly journals EVALUATING EFFECTIVENESS OF ENSEMBLE CLASSIFIERS WHEN DETECTING FUZZERS ATTACKS ON THE UNSW-NB15 DATASET

2020 ◽  
Vol 36 (2) ◽  
pp. 173-185
Author(s):  
Hoang Ngoc Thanh ◽  
Tran Van Lang

The UNSW-NB15 dataset was created by the Australian Cyber Security Centre in 2015 by using the IXIA tool to extract normal behaviors and modern attacks, it includes normal data and 9 types of attacks with 49 features. Previous research results show that the detection of Fuzzers attacks in this dataset gives the lowest classification quality. This paper analyzes and evaluates the performance of using known ensemble techniques such as Bagging, AdaBoost, Stacking, Decorate, Random Forest and Voting to detect FUZZERS attacks on UNSW-NB15 dataset to create models. The experimental results show that the AdaBoost technique with the component classifiers using decision tree for the best classification quality with F-Measure is 96.76% compared to 94.16%, which is the best result obtained by using single classifiers and 96.36% by using the Random Forest technique.

Author(s):  
Krishna Kumar Mohbey

In any industry, attrition is a big problem, whether it is about employee attrition of an organization or customer attrition of an e-commerce site. If we can accurately predict which customer or employee will leave their current company or organization, then it will save much time, effort, and cost of the employer and help them to hire or acquire substitutes in advance, and it would not create a problem in the ongoing progress of an organization. In this chapter, a comparative analysis between various machine learning approaches such as Naïve Bayes, SVM, decision tree, random forest, and logistic regression is presented. The presented result will help us in identifying the behavior of employees who can be attired over the next time. Experimental results reveal that the logistic regression approach can reach up to 86% accuracy over other machine learning approaches.


Author(s):  
Fadare Oluwaseun Gbenga ◽  
Adetunmbi Adebayo Olusola ◽  
Oyinloye Oghenerukevwe Elohor

The proliferation of Malware on computer communication systems posed great security challenges to confidential data stored and other valuable substances across the globe. There have been several attempts in curbing the menace using a signature-based approach and in recent times, machine learning techniques have been extensively explored. This paper proposes a framework combining the exploit of both feature selections based on extra tree and random forest and eight ensemble techniques on five base learners- KNN, Naive Bayes, SVM, Decision Trees, and Logistic Regression. K-Nearest Neighbors returns the highest accuracy of 96.48%, 96.40%, and 87.89% on extra-tree, random forest, and without feature selection (WFS) respectively. Random forest ensemble accuracy on both Feature Selections are the highest with 98.50% and 98.16% on random forest and extra-tree respectively. The Extreme Gradient Boosting Classifier is next on random-forest FS with an accuracy of 98.37% while Voting returns the least detection accuracy of 95.80%. On extra-tree FS, Bagging is next with a detection accuracy of 98.09% while Voting returns the least accuracy of 95.54%. Random Forest has the highest all in seven evaluative measures in both extra tree and random forest feature selection techniques. The study results uncover the tree-based ensemble model is proficient and successful for malware classification.


2021 ◽  
Vol 16 ◽  
pp. 502-507
Author(s):  
Suvaporn Homjandee ◽  
Krung Sinapiromsaran

Building an effective classifier that could classify a target or class of instances in a dataset from historical data has played an important role in machine learning for a decade. The standard classification algorithm has difficulty generating an appropriate classifier when faced with an imbalanced dataset. In 2019, the efficient splitting measure, minority condensation entropy (MCE) [1] is proposed that could build a decision tree to classify minority instances. The aim of this research is to extend the concept of a random forest to use both decision trees and minority condensation trees. The algorithm will build a minority condensation tree from a bootstrapped dataset maintaining all minorities while it will build a decision tree from a bootstrapped dataset of a balanced dataset. The experimental results on synthetic datasets apparent the results that confirm this proposed algorithm compared with the standard random forest are suitable for dealing with the binary-class imbalanced problem. Furthermore, the experiment on real-world datasets from the UCI repository shows that this proposed algorithm constructs a random forest that outperforms other existing random forest algorithms based on the recall, the precision, the F-measure, and the Geometric mean


Electronics ◽  
2021 ◽  
Vol 10 (10) ◽  
pp. 1195
Author(s):  
Priya Varshini A G ◽  
Anitha Kumari K ◽  
Vijayakumar Varadarajan

Software Project Estimation is a challenging and important activity in developing software projects. Software Project Estimation includes Software Time Estimation, Software Resource Estimation, Software Cost Estimation, and Software Effort Estimation. Software Effort Estimation focuses on predicting the number of hours of work (effort in terms of person-hours or person-months) required to develop or maintain a software application. It is difficult to forecast effort during the initial stages of software development. Various machine learning and deep learning models have been developed to predict the effort estimation. In this paper, single model approaches and ensemble approaches were considered for estimation. Ensemble techniques are the combination of several single models. Ensemble techniques considered for estimation were averaging, weighted averaging, bagging, boosting, and stacking. Various stacking models considered and evaluated were stacking using a generalized linear model, stacking using decision tree, stacking using a support vector machine, and stacking using random forest. Datasets considered for estimation were Albrecht, China, Desharnais, Kemerer, Kitchenham, Maxwell, and Cocomo81. Evaluation measures used were mean absolute error, root mean squared error, and R-squared. The results proved that the proposed stacking using random forest provides the best results compared with single model approaches using the machine or deep learning algorithms and other ensemble techniques.


2020 ◽  
Vol 4 (Supplement_1) ◽  
pp. 268-269
Author(s):  
Jaime Speiser ◽  
Kathryn Callahan ◽  
Jason Fanning ◽  
Thomas Gill ◽  
Anne Newman ◽  
...  

Abstract Advances in computational algorithms and the availability of large datasets with clinically relevant characteristics provide an opportunity to develop machine learning prediction models to aid in diagnosis, prognosis, and treatment of older adults. Some studies have employed machine learning methods for prediction modeling, but skepticism of these methods remains due to lack of reproducibility and difficulty understanding the complex algorithms behind models. We aim to provide an overview of two common machine learning methods: decision tree and random forest. We focus on these methods because they provide a high degree of interpretability. We discuss the underlying algorithms of decision tree and random forest methods and present a tutorial for developing prediction models for serious fall injury using data from the Lifestyle Interventions and Independence for Elders (LIFE) study. Decision tree is a machine learning method that produces a model resembling a flow chart. Random forest consists of a collection of many decision trees whose results are aggregated. In the tutorial example, we discuss evaluation metrics and interpretation for these models. Illustrated in data from the LIFE study, prediction models for serious fall injury were moderate at best (area under the receiver operating curve of 0.54 for decision tree and 0.66 for random forest). Machine learning methods may offer improved performance compared to traditional models for modeling outcomes in aging, but their use should be justified and output should be carefully described. Models should be assessed by clinical experts to ensure compatibility with clinical practice.


2021 ◽  
Vol 11 (4) ◽  
pp. 1378
Author(s):  
Seung Hyun Lee ◽  
Jaeho Son

It has been pointed out that the act of carrying a heavy object that exceeds a certain weight by a worker at a construction site is a major factor that puts physical burden on the worker’s musculoskeletal system. However, due to the nature of the construction site, where there are a large number of workers simultaneously working in an irregular space, it is difficult to figure out the weight of the object carried by the worker in real time or keep track of the worker who carries the excess weight. This paper proposes a prototype system to track the weight of heavy objects carried by construction workers by developing smart safety shoes with FSR (Force Sensitive Resistor) sensors. The system consists of smart safety shoes with sensors attached, a mobile device for collecting initial sensing data, and a web-based server computer for storing, preprocessing and analyzing such data. The effectiveness and accuracy of the weight tracking system was verified through the experiments where a weight was lifted by each experimenter from +0 kg to +20 kg in 5 kg increments. The results of the experiment were analyzed by a newly developed machine learning based model, which adopts effective classification algorithms such as decision tree, random forest, gradient boosting algorithm (GBM), and light GBM. The average accuracy classifying the weight by each classification algorithm showed similar, but high accuracy in the following order: random forest (90.9%), light GBM (90.5%), decision tree (90.3%), and GBM (89%). Overall, the proposed weight tracking system has a significant 90.2% average accuracy in classifying how much weight each experimenter carries.


2015 ◽  
Vol 2015 ◽  
pp. 1-11 ◽  
Author(s):  
Tao Xiang ◽  
Tao Li ◽  
Mao Ye ◽  
Zijian Liu

Pedestrian detection with large intraclass variations is still a challenging task in computer vision. In this paper, we propose a novel pedestrian detection method based on Random Forest. Firstly, we generate a few local templates with different sizes and different locations in positive exemplars. Then, the Random Forest is built whose splitting functions are optimized by maximizing class purity of matching the local templates to the training samples, respectively. To improve the classification accuracy, we adopt a boosting-like algorithm to update the weights of the training samples in a layer-wise fashion. During detection, the trained Random Forest will vote the category when a sliding window is input. Our contributions are the splitting functions based on local template matching with adaptive size and location and iteratively weight updating method. We evaluate the proposed method on 2 well-known challenging datasets: TUD pedestrians and INRIA pedestrians. The experimental results demonstrate that our method achieves state-of-the-art or competitive performance.


Author(s):  
Hsun-Ping Hsieh ◽  
JiaWei Jiang ◽  
Tzu-Hsin Yang ◽  
Renfen Hu

The success of mediation is affected by many factors, such as the context of the quarrel, personality of both parties, and the negotiation skill of the mediator, which lead to uncertainty for the predicting work. This paper takes a different approach from previous legal prediction research. It analyzes and predicts whether two parties in a dispute can reach an agreement peacefully through the conciliation of mediation. With the inference result, we can know if the mediation is a more practical and time-saving method to solve the dispute. Existing works about legal case prediction mostly focus on prosecution or criminal cases. In this work, we propose a LSTM-based framework, called LSTMEnsembler, to predict mediation results by assembling multiple classifiers. Among these classifiers, some are powerful for modeling the numerical and categorical features of case information, e.g., XGBoost and LightGBM; and, some are effective for dealing with textual data, e.g., TextCNN and BERT. The proposed LSTMEnsembler aims to not only combine the effectiveness of different classifiers intelligently, but also capture temporal dependencies from previous cases to boost the performance of mediation prediction. Our experimental results show that our proposed LSTMEnsembler can achieve 85.6% for F-measure on real-world mediation data.


Sign in / Sign up

Export Citation Format

Share Document