Vigorous IDS on Nefarious Operations and Threat Analysis Using Ensemble Machine Learning

The geometric increase in the usage of computer networking activities poses problems with the management of network normal operations. These issues had drawn the attention of network security researchers to introduce different kinds of intrusion detection systems (IDS) which monitor data flow in a network for unwanted and illicit operations. The violation of security policies with nefarious motive is what is known as intrusion. The IDS therefore examine traffic passing through networked systems checking for nefarious operations and threats, which then sends warnings if any of these malicious activities are detected. There are 2 types of detection of malicious activities, misuse detection, in this case the information about the passing network traffic is gathered, analyzed, which is then compared with the stored predefined signatures. The other type of detection is the Anomaly detection which is detecting all network activities that deviates from regular user operations. Several researchers have done various works on IDS in which they employed different machine learning (ML), evaluating their work on various datasets. In this paper, an efficient IDS is built using Ensemble machine learning algorithms which is evaluated on CIC-IDS2017, an updated dataset that contains most recent attacks. The results obtained show a great increase in the rate of detection, increase in accuracy as well as reduction in the false positive rates (FPR).

Download Full-text

Comparison of Ensemble Machine Learning Methods for Soil Erosion Pin Measurements

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10010042 ◽

2021 ◽

Vol 10 (1) ◽

pp. 42

Author(s):

Kieu Anh Nguyen ◽

Walter Chen ◽

Bor-Shiun Lin ◽

Uma Seeboonruang

Keyword(s):

Machine Learning ◽

Soil Erosion ◽

Ensemble Methods ◽

Machine Learning Algorithms ◽

Multivariate Adaptive Regression Splines ◽

Gradient Boosting ◽

Support Vector ◽

Ensemble Machine Learning ◽

Boosting Method ◽

Bagging Method

Although machine learning has been extensively used in various fields, it has only recently been applied to soil erosion pin modeling. To improve upon previous methods of quantifying soil erosion based on erosion pin measurements, this study explored the possible application of ensemble machine learning algorithms to the Shihmen Reservoir watershed in northern Taiwan. Three categories of ensemble methods were considered in this study: (a) Bagging, (b) boosting, and (c) stacking. The bagging method in this study refers to bagged multivariate adaptive regression splines (bagged MARS) and random forest (RF), and the boosting method includes Cubist and gradient boosting machine (GBM). Finally, the stacking method is an ensemble method that uses a meta-model to combine the predictions of base models. This study used RF and GBM as the meta-models, decision tree, linear regression, artificial neural network, and support vector machine as the base models. The dataset used in this study was sampled using stratified random sampling to achieve a 70/30 split for the training and test data, and the process was repeated three times. The performance of six ensemble methods in three categories was analyzed based on the average of three attempts. It was found that GBM performed the best among the ensemble models with the lowest root-mean-square error (RMSE = 1.72 mm/year), the highest Nash-Sutcliffe efficiency (NSE = 0.54), and the highest index of agreement (d = 0.81). This result was confirmed by the spatial comparison of the absolute differences (errors) between model predictions and observations using GBM and RF in the study area. In summary, the results show that as a group, the bagging method and the boosting method performed equally well, and the stacking method was third for the erosion pin dataset considered in this study.

Download Full-text

THE USE OF DATA MINING IN DETECTING CREDIT CARD FRAUD

10.31234/osf.io/uhqcs ◽

2022 ◽

Author(s):

Kingsley Austin

Keyword(s):

Machine Learning ◽

Credit Card ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

High Detection Rate ◽

Credit Card Fraud ◽

Real Time Processing ◽

Detection Systems ◽

Hybrid Approaches ◽

Use Of Data

Abstract— Credit card fraud is a serious problem for e-commerce retailers with UK merchants reporting losses of $574.2M in 2020. As a result, effective fraud detection systems must be in place to ensure that payments are processed securely in an online environment. From the literature, the detection of credit card fraud is challenging due to dataset imbalance (genuine versus fraudulent transactions), real-time processing requirements, and the dynamic behavior of fraudsters and customers. It is proposed in this paper that the use of machine learning could be an effective solution for combating credit card fraud.According to research, machine learning techniques can play a role in overcoming the identified challenges while ensuring a high detection rate of fraudulent transactions, both directly and indirectly. Even though both supervised and unsupervised machine learning algorithms have been suggested, the flaws in both methods point to the necessity for hybrid approaches.

Download Full-text

Intrusion Detection Systems Based on Machine Learning Algorithms

2021 IEEE International Conference on Automatic Control & Intelligent Systems (I2CACIS) ◽

10.1109/i2cacis52118.2021.9495897 ◽

2021 ◽

Author(s):

Sandy Victor Amanoul ◽

Adnan Mohsin Abdulazeez ◽

Diyar Qader Zeebare ◽

Falah Y. H. Ahmed

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Intrusion Detection Systems ◽

Detection Systems

Download Full-text

Prediction and Analysis of Gold Prices using Ensemble Machine Learning Algorithms

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.36028 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 4367-4374

Author(s):

Gudipally Chandrashakar

Keyword(s):

Machine Learning ◽

Time Series ◽

Time Series Data ◽

Gold Price ◽

Machine Learning Algorithms ◽

Series Data ◽

Gradient Boosting ◽

Support Vector ◽

Average Value ◽

Ensemble Machine Learning

In this article, we used historical time series data up to the current day gold price. In this study of predicting gold price, we consider few correlating factors like silver price, copper price, standard, and poor’s 500 value, dollar-rupee exchange rate, Dow Jones Industrial Average Value. Considering the prices of every correlating factor and gold price data where dates ranging from 2008 January to 2021 February. Few algorithms of machine learning are used to analyze the time-series data are Random Forest Regression, Support Vector Regressor, Linear Regressor, ExtraTrees Regressor and Gradient boosting Regression. While seeing the results the Extra Tree Regressor algorithm gives the predicted value of gold prices more accurately.

Download Full-text

A Survey on Machine Learning Algorithms for Vision State Classification and Prediction Through Electroencephalogram (EEG) Signal

10.46532/978-81-950008-1-4_093 ◽

2020 ◽

pp. 426-429

Author(s):

Devipriya A ◽

Brindha D ◽

Kousalya A

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Problem Area ◽

Machine Learning Algorithms ◽

Eeg Signal ◽

Ensemble Machine Learning ◽

State Classification ◽

Machine Learning Model ◽

Knn Classification ◽

Electroencephalogram Eeg

Eye state ID is a sort of basic time-arrangement grouping issue in which it is additionally a problem area in the late exploration. Electroencephalography (EEG) is broadly utilized in a vision state in order to recognize people perception form. Past examination was approved possibility of AI & measurable methodologies of EEG vision state arrangement. This research means to propose novel methodology for EEG vision state distinguishing proof utilizing Gradual Characteristic Learning (GCL) in light of neural organizations. GCL is a novel AI methodology which bit by bit imports and prepares includes individually. Past examinations have confirmed that such a methodology is appropriate for settling various example acknowledgment issues. Nonetheless, in these past works, little examination on GCL zeroed in its application to temporal-arrangement issues. Thusly, it is as yet unclear if GCL will be utilized for adapting the temporal-arrangement issues like EEG vision state characterization. Trial brings about this examination shows that, with appropriate element extraction and highlight requesting, GCL cannot just productively adapt to time-arrangement order issues, yet additionally display better grouping execution as far as characterization mistake rates in correlation with ordinary and some different methodologies. Vision state classification is performed and discussed with KNN classification and accuracy is enriched finally discussed the vision state classification with ensemble machine learning model.

Download Full-text

Spam Mail Classification Using Ensemble and Non-Ensemble Machine Learning Algorithms

Machine Learning for Predictive Analysis - Lecture Notes in Networks and Systems ◽

10.1007/978-981-15-7106-0_18 ◽

2020 ◽

pp. 179-189

Author(s):

Khyati Agarwal ◽

Prakhar Uniyal ◽

Suryavanshi Virendrasingh ◽

Sai Krishna ◽

Varun Dutt

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Ensemble Machine Learning

Download Full-text

Applying Machine Learning Algorithms in Network-Based Intrusion Detection Systems

Lecture Notes in Electrical Engineering - Trends in Wireless Communication and Information Security ◽

10.1007/978-981-33-6393-9_24 ◽

2021 ◽

pp. 229-236

Author(s):

Nilesh Kumar Sahu ◽

Itu Snigdh

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Intrusion Detection Systems ◽

Detection Systems

Download Full-text

Ensemble-Based Online Machine Learning Algorithms for Network Intrusion Detection Systems Using Streaming Data

Information ◽

10.3390/info11060315 ◽

2020 ◽

Vol 11 (6) ◽

pp. 315

Author(s):

Nathan Martindale ◽

Muhammad Ismail ◽

Douglas A. Talbert

Keyword(s):

Machine Learning ◽

Random Forest ◽

Intrusion Detection ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Intrusion Detection Systems ◽

Network Intrusion Detection ◽

Detection Systems ◽

Network Intrusion ◽

Network Intrusion Detection Systems

As new cyberattacks are launched against systems and networks on a daily basis, the ability for network intrusion detection systems to operate efficiently in the big data era has become critically important, particularly as more low-power Internet-of-Things (IoT) devices enter the market. This has motivated research in applying machine learning algorithms that can operate on streams of data, trained online or “live” on only a small amount of data kept in memory at a time, as opposed to the more classical approaches that are trained solely offline on all of the data at once. In this context, one important concept from machine learning for improving detection performance is the idea of “ensembles”, where a collection of machine learning algorithms are combined to compensate for their individual limitations and produce an overall superior algorithm. Unfortunately, existing research lacks proper performance comparison between homogeneous and heterogeneous online ensembles. Hence, this paper investigates several homogeneous and heterogeneous ensembles, proposes three novel online heterogeneous ensembles for intrusion detection, and compares their performance accuracy, run-time complexity, and response to concept drifts. Out of the proposed novel online ensembles, the heterogeneous ensemble consisting of an adaptive random forest of Hoeffding Trees combined with a Hoeffding Adaptive Tree performed the best, by dealing with concept drift in the most effective way. While this scheme is less accurate than a larger size adaptive random forest, it offered a marginally better run-time, which is beneficial for online training.

Download Full-text

Classification of Driver Distraction: A Comprehensive Analysis of Feature Generation, Machine Learning, and Input Measures

Human Factors The Journal of the Human Factors and Ergonomics Society ◽

10.1177/0018720819856454 ◽

2019 ◽

Vol 62 (6) ◽

pp. 1019-1035 ◽

Cited By ~ 7

Author(s):

Anthony D. McDonald ◽

Thomas K. Ferris ◽

Tyler A. Wiener

Keyword(s):

Machine Learning ◽

Driving Behavior ◽

Driver Distraction ◽

Machine Learning Algorithms ◽

Physiological Data ◽

Learning Approaches ◽

Feature Generation ◽

Driver Performance ◽

Ensemble Machine Learning ◽

Vehicle Information

Objective The objective of this study was to analyze a set of driver performance and physiological data using advanced machine learning approaches, including feature generation, to determine the best-performing algorithms for detecting driver distraction and predicting the source of distraction. Background Distracted driving is a causal factor in many vehicle crashes, often resulting in injuries and deaths. As mobile devices and in-vehicle information systems become more prevalent, the ability to detect and mitigate driver distraction becomes more important. Method This study trained 21 algorithms to identify when drivers were distracted by secondary cognitive and texting tasks. The algorithms included physiological and driving behavioral input processed with a comprehensive feature generation package, Time Series Feature Extraction based on Scalable Hypothesis tests. Results Results showed that a Random Forest algorithm, trained using only driving behavior measures and excluding driver physiological data, was the highest-performing algorithm for accurately classifying driver distraction. The most important input measures identified were lane offset, speed, and steering, whereas the most important feature types were standard deviation, quantiles, and nonlinear transforms. Conclusion This work suggests that distraction detection algorithms may be improved by considering ensemble machine learning algorithms that are trained with driving behavior measures and nonstandard features. In addition, the study presents several new indicators of distraction derived from speed and steering measures. Application Future development of distraction mitigation systems should focus on driver behavior–based algorithms that use complex feature generation techniques.

Download Full-text

Predicting Loan Approval of Bank Direct Marketing Data Using Ensemble Machine Learning Algorithms

International Journal of Circuits, Systems and Signal Processing ◽

10.46300/9106.2020.14.117 ◽

2020 ◽

Vol 14 ◽

Keyword(s):

Machine Learning ◽

Prediction Model ◽

Prediction Models ◽

Machine Learning Algorithms ◽

Decision Makers ◽

Machine Learning Techniques ◽

Data Set ◽

Ensemble Machine Learning ◽

Marketing Data ◽

Loan Approval

The Bank Marketing data set at Kaggle is mostly used in predicting if bank clients will subscribe a long-term deposit. We believe that this data set could provide more useful information such as predicting whether a bank client could be approved for a loan. This is a critical choice that has to be made by decision makers at the bank. Building a prediction model for such high-stakes decision does not only require high model prediction accuracy, but also needs a reasonable prediction interpretation. In this research, different ensemble machine learning techniques have been deployed such as Bagging and Boosting. Our research results showed that the loan approval prediction model has an accuracy of 83.97%, which is approximately 25% better than most state-of-the-art other loan prediction models found in the literature. As well, the model interpretation efforts done in this research was able to explain a few critical cases that the bank decision makers may encounter; therefore, the high accuracy of the designed models was accompanied with a trust in prediction. We believe that the achieved model accuracy accompanied with the provided interpretation information are vitally needed for decision makers to understand how to maintain balance between security and reliability of their financial lending system, while providing fair credit opportunities to their clients.

Download Full-text