Detecting web attacks using random undersampling and ensemble learners

AbstractClass imbalance is an important consideration for cybersecurity and machine learning. We explore classification performance in detecting web attacks in the recent CSE-CIC-IDS2018 dataset. This study considers a total of eight random undersampling (RUS) ratios: no sampling, 999:1, 99:1, 95:5, 9:1, 3:1, 65:35, and 1:1. Additionally, seven different classifiers are employed: Decision Tree (DT), Random Forest (RF), CatBoost (CB), LightGBM (LGB), XGBoost (XGB), Naive Bayes (NB), and Logistic Regression (LR). For classification performance metrics, Area Under the Receiver Operating Characteristic Curve (AUC) and Area Under the Precision-Recall Curve (AUPRC) are both utilized to answer the following three research questions. The first question asks: “Are various random undersampling ratios statistically different from each other in detecting web attacks?” The second question asks: “Are different classifiers statistically different from each other in detecting web attacks?” And, our third question asks: “Is the interaction between different classifiers and random undersampling ratios significant for detecting web attacks?” Based on our experiments, the answers to all three research questions is “Yes”. To the best of our knowledge, we are the first to apply random undersampling techniques to web attacks from the CSE-CIC-IDS2018 dataset while exploring various sampling ratios.

Download Full-text

Severely imbalanced Big Data challenges: investigating data sampling approaches

Journal Of Big Data ◽

10.1186/s40537-019-0274-4 ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 3

Author(s):

Tawfiq Hasanin ◽

Taghi M. Khoshgoftaar ◽

Joffrey L. Leevy ◽

Richard A. Bauder

Keyword(s):

Big Data ◽

Operating Characteristic ◽

Performance Metrics ◽

Characteristic Curve ◽

Geometric Mean ◽

Class Imbalance ◽

Random Undersampling ◽

First Case ◽

Negative Class

AbstractSevere class imbalance between majority and minority classes in Big Data can bias the predictive performance of Machine Learning algorithms toward the majority (negative) class. Where the minority (positive) class holds greater value than the majority (negative) class and the occurrence of false negatives incurs a greater penalty than false positives, the bias may lead to adverse consequences. Our paper incorporates two case studies, each utilizing three learners, six sampling approaches, two performance metrics, and five sampled distribution ratios, to uniquely investigate the effect of severe class imbalance on Big Data analytics. The learners (Gradient-Boosted Trees, Logistic Regression, Random Forest) were implemented within the Apache Spark framework. The first case study is based on a Medicare fraud detection dataset. The second case study, unlike the first, includes training data from one source (SlowlorisBig Dataset) and test data from a separate source (POST dataset). Results from the Medicare case study are not conclusive regarding the best sampling approach using Area Under the Receiver Operating Characteristic Curve and Geometric Mean performance metrics. However, it should be noted that the Random Undersampling approach performs adequately in the first case study. For the SlowlorisBig case study, Random Undersampling convincingly outperforms the other five sampling approaches (Random Oversampling, Synthetic Minority Over-sampling TEchnique, SMOTE-borderline1 , SMOTE-borderline2 , ADAptive SYNthetic) when measuring performance with Area Under the Receiver Operating Characteristic Curve and Geometric Mean metrics. Based on its classification performance in both case studies, Random Undersampling is the best choice as it results in models with a significantly smaller number of samples, thus reducing computational burden and training time.

Download Full-text

Investigating rarity in web attacks with ensemble learners

Journal Of Big Data ◽

10.1186/s40537-021-00462-6 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Richard Zuech ◽

John Hancock ◽

Taghi M. Khoshgoftaar

Keyword(s):

Characteristic Curve ◽

Class Imbalance ◽

Poor Performance ◽

Classification Performance ◽

Brute Force ◽

Experimental Conditions ◽

Web Attacks ◽

Random Undersampling ◽

Positive Class ◽

Performance Area

AbstractClass rarity is a frequent challenge in cybersecurity. Rarity occurs when the positive (attack) class only has a small number of instances for machine learning classifiers to train upon, thus making it difficult for the classifiers to discriminate and learn from the positive class. To investigate rarity, we examine three individual web attacks in big data from the CSE-CIC-IDS2018 dataset: “Brute Force-Web”, “Brute Force-XSS”, and “SQL Injection”. These three individual web attacks are also severely imbalanced, and so we evaluate whether random undersampling (RUS) treatments can improve the classification performance for these three individual web attacks. The following eight different levels of RUS ratios are evaluated: no sampling, 999:1, 99:1, 95:5, 9:1, 3:1, 65:35, and 1:1. For measuring classification performance, Area Under the Receiver Operating Characteristic Curve (AUC) metrics are obtained for the following seven different classifiers: Random Forest (RF), CatBoost (CB), LightGBM (LGB), XGBoost (XGB), Decision Tree (DT), Naive Bayes (NB), and Logistic Regression (LR) (with the first four learners being ensemble learners and for comparison, the last three being single learners). We find that applying random undersampling does improve overall classification performance with the AUC metric in a statistically significant manner. Ensemble learners achieve the top AUC scores after massive undersampling is applied, but the ensemble learners break down and have poor performance (worse than NB and DT) when no sampling is applied to our unique and harsh experimental conditions of severe class imbalance and rarity.

Download Full-text

IoT information theft prediction using ensemble feature selection

Journal Of Big Data ◽

10.1186/s40537-021-00558-z ◽

2022 ◽

Vol 9 (1) ◽

Author(s):

Joffrey L. Leevy ◽

John Hancock ◽

Taghi M. Khoshgoftaar ◽

Jared M. Peterson

Keyword(s):

Feature Selection ◽

Operating Characteristic ◽

Characteristic Curve ◽

Classification Performance ◽

Feature Reduction ◽

Security Risk ◽

Precision Recall Curve ◽

Iot Devices ◽

Feature Selection Techniques ◽

Better Than

AbstractThe recent years have seen a proliferation of Internet of Things (IoT) devices and an associated security risk from an increasing volume of malicious traffic worldwide. For this reason, datasets such as Bot-IoT were created to train machine learning classifiers to identify attack traffic in IoT networks. In this study, we build predictive models with Bot-IoT to detect attacks represented by dataset instances from the Information Theft category, as well as dataset instances from the data exfiltration and keylogging subcategories. Our contribution is centered on the evaluation of ensemble feature selection techniques (FSTs) on classification performance for these specific attack instances. A group or ensemble of FSTs will often perform better than the best individual technique. The classifiers that we use are a diverse set of four ensemble learners (Light GBM, CatBoost, XGBoost, and random forest (RF)) and four non-ensemble learners (logistic regression (LR), decision tree (DT), Naive Bayes (NB), and a multi-layer perceptron (MLP)). The metrics used for evaluating classification performance are area under the receiver operating characteristic curve (AUC) and Area Under the precision-recall curve (AUPRC). For the most part, we determined that our ensemble FSTs do not affect classification performance but are beneficial because feature reduction eases computational burden and provides insight through improved data visualization.

Download Full-text

The precision–recall curve overcame the optimism of the receiver operating characteristic curve in rare diseases

Journal of Clinical Epidemiology ◽

10.1016/j.jclinepi.2015.02.010 ◽

2015 ◽

Vol 68 (8) ◽

pp. 855-859 ◽

Cited By ~ 43

Author(s):

Brice Ozenne ◽

Fabien Subtil ◽

Delphine Maucort-Boulch

Keyword(s):

Receiver Operating Characteristic Curve ◽

Receiver Operating Characteristic ◽

Rare Diseases ◽

Operating Characteristic ◽

Characteristic Curve ◽

Precision Recall Curve ◽

Operating Characteristic Curve ◽

Recall Curve ◽

Receiver Operating

Download Full-text

Deep Neural Networks Can Predict New-Onset Atrial Fibrillation From the 12-Lead ECG and Help Identify Those at Risk of Atrial Fibrillation–Related Stroke

Circulation ◽

10.1161/circulationaha.120.047829 ◽

2021 ◽

Vol 143 (13) ◽

pp. 1287-1298 ◽

Cited By ~ 1

Author(s):

Sushravya Raghunath ◽

John M. Pfeifer ◽

Alvaro E. Ulloa-Cerna ◽

Arun Nemani ◽

Tanner Carbonati ◽

...

Keyword(s):

Atrial Fibrillation ◽

At Risk ◽

Deep Neural Networks ◽

Operating Characteristic ◽

Characteristic Curve ◽

Patients At Risk ◽

History Of ◽

Precision Recall Curve ◽

New Onset ◽

Recall Curve

Background: Atrial fibrillation (AF) is associated with substantial morbidity, especially when it goes undetected. If new-onset AF could be predicted, targeted screening could be used to find it early. We hypothesized that a deep neural network could predict new-onset AF from the resting 12-lead ECG and that this prediction may help identify those at risk of AF-related stroke. Methods: We used 1.6 M resting 12-lead digital ECG traces from 430 000 patients collected from 1984 to 2019. Deep neural networks were trained to predict new-onset AF (within 1 year) in patients without a history of AF. Performance was evaluated using areas under the receiver operating characteristic curve and precision-recall curve. We performed an incidence-free survival analysis for a period of 30 years following the ECG stratified by model predictions. To simulate real-world deployment, we trained a separate model using all ECGs before 2010 and evaluated model performance on a test set of ECGs from 2010 through 2014 that were linked to our stroke registry. We identified the patients at risk for AF-related stroke among those predicted to be high risk for AF by the model at different prediction thresholds. Results: The area under the receiver operating characteristic curve and area under the precision-recall curve were 0.85 and 0.22, respectively, for predicting new-onset AF within 1 year of an ECG. The hazard ratio for the predicted high- versus low-risk groups over a 30-year span was 7.2 (95% CI, 6.9–7.6). In a simulated deployment scenario, the model predicted new-onset AF at 1 year with a sensitivity of 69% and specificity of 81%. The number needed to screen to find 1 new case of AF was 9. This model predicted patients at high risk for new-onset AF in 62% of all patients who experienced an AF-related stroke within 3 years of the index ECG. Conclusions: Deep learning can predict new-onset AF from the 12-lead ECG in patients with no previous history of AF. This prediction may help identify patients at risk for AF-related strokes.

Download Full-text

Detecting cybersecurity attacks across different network features and learners

Journal Of Big Data ◽

10.1186/s40537-021-00426-w ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Joffrey L. Leevy ◽

John Hancock ◽

Richard Zuech ◽

Taghi M. Khoshgoftaar

Keyword(s):

Feature Selection ◽

Intrusion Detection ◽

Operating Characteristic ◽

Characteristic Curve ◽

Machine Learning Algorithms ◽

Feature Selection Technique ◽

Impact Performance ◽

Detection Model ◽

Wide Range ◽

Research Questions

AbstractMachine learning algorithms efficiently trained on intrusion detection datasets can detect network traffic capable of jeopardizing an information system. In this study, we use the CSE-CIC-IDS2018 dataset to investigate ensemble feature selection on the performance of seven classifiers. CSE-CIC-IDS2018 is big data (about 16,000,000 instances), publicly available, modern, and covers a wide range of realistic attack types. Our contribution is centered around answers to three research questions. The first question is, “Does feature selection impact performance of classifiers in terms of Area Under the Receiver Operating Characteristic Curve (AUC) and F1-score?” The second question is, “Does including the Destination_Port categorical feature significantly impact performance of LightGBM and Catboost in terms of AUC and F1-score?” The third question is, “Does the choice of classifier: Decision Tree (DT), Random Forest (RF), Naive Bayes (NB), Logistic Regression (LR), Catboost, LightGBM, or XGBoost, significantly impact performance in terms of AUC and F1-score?” These research questions are all answered in the affirmative and provide valuable, practical information for the development of an efficient intrusion detection model. To the best of our knowledge, we are the first to use an ensemble feature selection technique with the CSE-CIC-IDS2018 dataset.

Download Full-text

Machine Learning Models for Diagnostic Classiﬁcation of Hepatitis C Tests

Frontiers in Health Informatics ◽

10.30699/fhi.v10i1.274 ◽

2021 ◽

Vol 10 (1) ◽

pp. 70

Author(s):

Oladosu Oyebisi Oladimeji ◽

Abimbola Oladimeji ◽

Oladimeji Olayanju

Keyword(s):

Hepatitis C ◽

Area Under Curve ◽

Operating Characteristic ◽

Matthews Correlation Coefficient ◽

Receiver Operating Characteristic Area ◽

Precision Recall Curve ◽

Characteristic Area ◽

F Measure ◽

Machine Learning Models ◽

Recall Curve

Introduction: Hepatitis C is a chronic infection caused by hepatitis c virus - a blood borne virus. Therefore, the infection occurs through exposure to small quantities of blood. It has been estimated by World Health Organization (WHO) to have affected 71 million people worldwide. This infection costs individual, groups and government a lot because no vaccine has been gotten yet for the treatment. This disease is likely to continue to affect more people because it’s long asymptotic phase which makes its early detection not feasible.Material and Methods: In this study, we have presented machine learning models to automatically classify the diagnosis test of hepatitis and also ranked the test features in order to know how they contribute to the classification which help in decision making process by the health care industry. The synthetic minority oversampling technique (SMOTE) was used to solve the problem of imbalance dataset.Results: The models were evaluated based on metrics such as Matthews correlation coefficient, F-measure, Precision-Recall curve and Receiver Operating Characteristic Area Under Curve. We found that using SMOTE techniques helped raise performance of the predictive models. Also, random forest (RF) had the best performance based on Matthews correlation coefficient (0.99), F-measure (0.99), Precision-Recall curve (1.00) and Receiver Operating Characteristic Area Under Curve (0.99).Conclusion: This discovery has the potential to impact on clinical practice, when health workers aim at classifying diagnosis result of disease at its early stage.

Download Full-text

The impact of class imbalance in classification performance metrics based on the binary confusion matrix

Pattern Recognition ◽

10.1016/j.patcog.2019.02.023 ◽

2019 ◽

Vol 91 ◽

pp. 216-231 ◽

Cited By ~ 52

Author(s):

Amalia Luque ◽

Alejandro Carrasco ◽

Alejandro Martín ◽

Ana de las Heras

Keyword(s):

Performance Metrics ◽

Confusion Matrix ◽

Class Imbalance ◽

Classification Performance ◽

The Impact

Download Full-text

Utility of MemTrax and Machine Learning Modeling in Classification of Mild Cognitive Impairment

Journal of Alzheimer s Disease ◽

10.3233/jad-191340 ◽

2020 ◽

Vol 77 (4) ◽

pp. 1545-1558

Author(s):

Michael F. Bergeron ◽

Sara Landset ◽

Xianbo Zhou ◽

Tao Ding ◽

Taghi M. Khoshgoftaar ◽

...

Keyword(s):

Machine Learning ◽

Cognitive Impairment ◽

Mild Cognitive Impairment ◽

Performance Metrics ◽

Early Stage ◽

Characteristic Curve ◽

Classification Performance ◽

Cognitive Screening ◽

Cross Sectional ◽

Machine Learning Classification

Background: The widespread incidence and prevalence of Alzheimer’s disease and mild cognitive impairment (MCI) has prompted an urgent call for research to validate early detection cognitive screening and assessment. Objective: Our primary research aim was to determine if selected MemTrax performance metrics and relevant demographics and health profile characteristics can be effectively utilized in predictive models developed with machine learning to classify cognitive health (normal versus MCI), as would be indicated by the Montreal Cognitive Assessment (MoCA). Methods: We conducted a cross-sectional study on 259 neurology, memory clinic, and internal medicine adult patients recruited from two hospitals in China. Each patient was given the Chinese-language MoCA and self-administered the continuous recognition MemTrax online episodic memory test on the same day. Predictive classification models were built using machine learning with 10-fold cross validation, and model performance was measured using Area Under the Receiver Operating Characteristic Curve (AUC). Models were built using two MemTrax performance metrics (percent correct, response time), along with the eight common demographic and personal history features. Results: Comparing the learners across selected combinations of MoCA scores and thresholds, Naïve Bayes was generally the top-performing learner with an overall classification performance of 0.9093. Further, among the top three learners, MemTrax-based classification performance overall was superior using just the top-ranked four features (0.9119) compared to using all 10 common features (0.8999). Conclusion: MemTrax performance can be effectively utilized in a machine learning classification predictive model screening application for detecting early stage cognitive impairment.

Download Full-text

Detecção de Anomalias em Vias Públicas Usando Características Espaciais e um Classificador Sequencial Bidirecional

10.48011/asba.v2i1.1140 ◽

2020 ◽

Author(s):

Fábio Ricardo Oliveira Bento ◽

Raquel Frizera Vassallo ◽

Jorge Leonid Aching Samatelo

Keyword(s):

Anomaly Detection ◽

Receiver Operating Characteristic Curve ◽

Receiver Operating Characteristic ◽

Error Rate ◽

Operating Characteristic ◽

Characteristic Curve ◽

Equal Error Rate ◽

Operating Characteristic Curve ◽

End To End ◽

Recall Curve

Detecção de anomalias consiste na identificação de eventos que não estão em conformidade com um padrão de comportamento esperado. No contexto de segurança em vias públicas, a detecção automática de eventos anômalos através de video, tem aplicação na identificação de comportamentos suspeitos. Nesse artigo é proposta uma abordagem para o problema da detecção automática de eventos anômalos em vı́deos de vias públicas baseado em um modelo de redes neurais profundas end-to-end, composto de duas partes: um extrator de caracterı́sticas espaciais baseado em uma rede neural convolucional pre-treinada, e um classificador de sequências temporais baseado em camadas recorrentes empilhadas. Realizamos experimentos nos conjuntos de dados UCSD Anomaly Detection Dataset. Os resultados foram avaliados com as métricas Area Under the Receiver Operating Characteristic Curve - AUC, Area Under the Precision vs Recall Curve - AUPRC e Equal Error Rate – EER. Durante os experimentos, o modelo obteve AUC acima de 95% e EER abaixo de 9%, os quais são resultados compatı́veis com a literatura atual.

Download Full-text