imbalanced classes Latest Research Papers

2021 ◽

Vol 11 (2) ◽

pp. 35-38

Author(s):

Anupam Agrawal ◽

Keyword(s):

Intrusion Detection ◽

Detection System ◽

Geometric Mean ◽

Principal Component ◽

Great Accuracy ◽

Machine Learning Algorithms ◽

Loop Model ◽

Imbalanced Classes ◽

Machine Readable

The paper describes a method of intrusion detection that keeps check of it with help of machine learning algorithms. The experiments have been conducted over KDD’99 cup dataset, which is an imbalanced dataset, cause of which recall of some classes coming drastically low as there were not enough instances of it in there. For Preprocessing of dataset One Hot Encoding and Label Encoding to make it machine readable. The dimensionality of dataset has been reduced using Principal Component Analysis and classification of dataset into classes viz. attack and normal is done by Naïve Bayes Classifier. Due to imbalanced nature, shift of focus was on recall and overall recall and compared with other models which have achieved great accuracy. Based on the results, using a self optimizing loop, model has achieved better geometric mean accuracy.

Download Full-text

The SAMME.C2 Algorithm for Severely Imbalanced Multi-class Classification

10.20944/preprints202112.0427.v1 ◽

2021 ◽

Author(s):

Banghee So ◽

Emiliano A. Valdez

Keyword(s):

Learning Algorithm ◽

Superior Performance ◽

Binary Classifier ◽

Classification Problems ◽

Minority Class ◽

Proposed Model ◽

Interesting Class ◽

Imbalanced Class ◽

Imbalanced Classes ◽

Multi Class Classification

Classification predictive modeling involves the accurate assignment of observations in a dataset to target classes or categories. There is an increasing growth of real-world classification problems with severely imbalanced class distributions. In this case, minority classes have much fewer observations to learn from than those from majority classes. Despite this sparsity, a minority class is often considered the more interesting class yet developing a scientific learning algorithm suitable for the observations presents countless challenges. In this article, we suggest a novel multi-class classification algorithm specialized to handle severely imbalanced classes based on the method we refer to as SAMME.C2. It blends the flexible mechanics of the boosting techniques from SAMME algorithm, a multi-class classifier, and Ada.C2 algorithm, a cost-sensitive binary classifier designed to address highly class imbalances. Not only do we provide the resulting algorithm but we also establish scientific and statistical formulation of our proposed SAMME.C2 algorithm. Through numerical experiments examining various degrees of classifier difficulty, we demonstrate consistent superior performance of our proposed model.

Download Full-text

GTF: An Adaptive Network Anomaly Detection Method at the Network Edge

Security and Communication Networks ◽

10.1155/2021/3017797 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Renjie Li ◽

Zhou Zhou ◽

Xuan Liu ◽

Da Li ◽

Wei Yang ◽

...

Keyword(s):

Anomaly Detection ◽

Input Data ◽

Rapid Development ◽

Poor Performance ◽

Gradient Boosting ◽

Data Format ◽

Detection Model ◽

Imbalanced Classes ◽

Public Datasets ◽

Network Anomaly Detection

Network Anomaly Detection (NAD) has become the foundation for network management and security due to the rapid development and adoption of edge computing technologies. There are two main characteristics of NAD tasks: tabular input data and imbalanced classes. Tabular input data format means NAD tasks take both sparse categorical features and dense numerical features as input. In order to achieve good performance, the detection model needs to handle both types of features efficiently. Among all widely used models, Gradient Boosting Decision Tree (GBDT) and Neural Network (NN) are the two most popular ones. However, each method has its limitation: GBDT is inefficient when dealing with sparse categorical features, while NN cannot yield satisfactory performance for dense numerical features. Imbalanced classes may downgrade the classifier’s performance and cause biased results towards the majority classes, often neglected by many exiting NAD studies. Most of the existing solutions addressing imbalance suffer from poor performance, high computational consumption, or loss of vital information under such a scenario. In this paper, we propose an adaptive ensemble-based method, named GTF, which combines TabTransformer and GBDT to leverage categorical and numerical features effectively and introduces Focal Loss to mitigate the imbalance classification. Our comprehensive experiments on two public datasets demonstrate that GTF can outperform other well-known methods in both multiclass and binary cases. Our implementation also shows that GTF has limited complexity, making it be a good candidate for deployment at the network edge.

Download Full-text

Optimization Based Undersampling for Imbalanced Classes

Adıyaman University Journal of Science ◽

10.37094/adyujsci.884120 ◽

2021 ◽

Author(s):

Fatih SAĞLAM

Keyword(s):

Imbalanced Classes

Download Full-text

Matrix sketching for supervised classification with imbalanced classes

Data Mining and Knowledge Discovery ◽

10.1007/s10618-021-00791-3 ◽

2021 ◽

Author(s):

Roberta Falcone ◽

Laura Anderlucci ◽

Angela Montanari

Keyword(s):

Supervised Classification ◽

Dimensional Space ◽

Real Data ◽

Support Vector ◽

Compression Technique ◽

Linear Discriminant ◽

Practical Applications ◽

Under Sampling ◽

Matrix Sketching ◽

Imbalanced Classes

AbstractThe presence of imbalanced classes is more and more common in practical applications and it is known to heavily compromise the learning process. In this paper we propose a new method aimed at addressing this issue in binary supervised classification. Re-balancing the class sizes has turned out to be a fruitful strategy to overcome this problem. Our proposal performs re-balancing through matrix sketching. Matrix sketching is a recently developed data compression technique that is characterized by the property of preserving most of the linear information that is present in the data. Such property is guaranteed by the Johnson-Lindenstrauss’ Lemma (1984) and allows to embed an n-dimensional space into a reduced one without distorting, within an $$\epsilon $$ ϵ -size interval, the distances between any pair of points. We propose to use matrix sketching as an alternative to the standard re-balancing strategies that are based on random under-sampling the majority class or random over-sampling the minority one. We assess the properties of our method when combined with linear discriminant analysis (LDA), classification trees (C4.5) and Support Vector Machines (SVM) on simulated and real data. Results show that sketching can represent a sound alternative to the most widely used rebalancing methods.

Download Full-text

A financial statement fraud model based on synthesized attribute selection and a dataset with missing values and imbalanced classes

Applied Soft Computing ◽

10.1016/j.asoc.2021.107487 ◽

2021 ◽

Vol 108 ◽

pp. 107487

Author(s):

Ching-Hsue Cheng ◽

Yung-Fu Kao ◽

Hsien-Ping Lin

Keyword(s):

Missing Values ◽

Attribute Selection ◽

Financial Statement ◽

Financial Statement Fraud ◽

Model Based ◽

Imbalanced Classes

Download Full-text

Machine Learning Screening of COVID-19 Patients Based on X-ray Images for Imbalanced Classes

2021 9th European Workshop on Visual Information Processing (EUVIP) ◽

10.1109/euvip50544.2021.9484001 ◽

2021 ◽

Author(s):

Ilyes Mrad ◽

Ridha Hamila ◽

Aiman Erbad ◽

Tahir Hamid ◽

Rashid Mazhar ◽

...

Keyword(s):

Machine Learning ◽

X Ray ◽

Imbalanced Classes

Download Full-text

Improving the Performance of Machine Learning-Based Network Intrusion Detection Systems on the UNSW-NB15 Dataset

Computational Intelligence and Neuroscience ◽

10.1155/2021/5557577 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Soulaiman Moualla ◽

Khaldoun Khorzom ◽

Assef Jafar

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

False Alarm ◽

Supervised Machine Learning ◽

Intrusion Detection Systems ◽

Detection Rates ◽

Detection Systems ◽

Network Intrusion ◽

Imbalanced Classes ◽

Learning Machine

Networks are exposed to an increasing number of cyberattacks due to their vulnerabilities. So, cybersecurity strives to make networks as safe as possible, by introducing defense systems to detect any suspicious activities. However, firewalls and classical intrusion detection systems (IDSs) suffer from continuous updating of their defined databases to detect threats. The new directions of the IDSs aim to leverage the machine learning models to design more robust systems with higher detection rates and lower false alarm rates. This research presents a novel network IDS, which plays an important role in network security and faces the current cyberattacks on networks using the UNSW-NB15 dataset benchmark. Our proposed system is a dynamically scalable multiclass machine learning-based network IDS. It consists of several stages based on supervised machine learning. It starts with the Synthetic Minority Oversampling Technique (SMOTE) method to solve the imbalanced classes problem in the dataset and then selects the important features for each class existing in the dataset by the Gini Impurity criterion using the Extremely Randomized Trees Classifier (Extra Trees Classifier). After that, a pretrained extreme learning machine (ELM) model is responsible for detecting the attacks separately, “One-Versus-All” as a binary classifier for each of them. Finally, the ELM classifier outputs become the inputs to a fully connected layer in order to learn from all their combinations, followed by a logistic regression layer to make soft decisions for all classes. Results show that our proposed system performs better than related works in terms of accuracy, false alarm rate, Receiver Operating Characteristic (ROC), and Precision-Recall Curves (PRCs).

Download Full-text

Deep Neural Network to Identify Patients with Alcohol Use Disorder

Studies in Health Technology and Informatics - Public Health and Informatics ◽

10.3233/shti210156 ◽

2021 ◽

Author(s):

Ali Ebrahimi ◽

Uffe Kock Wiil ◽

Marjan Mansourvar ◽

Amin Naemi ◽

Kjeld Andersen ◽

...

Keyword(s):

Alcohol Use ◽

Alcohol Use Disorder ◽

Deep Neural Network ◽

Model Development ◽

Categorical Variables ◽

Learning Approaches ◽

Validation Data ◽

Imbalanced Classes ◽

Fully Connected ◽

Development And Validation

This paper presents an application of deep neural networks (DNN) to identify patients with Alcohol Use Disorder based on historical electronic health records. Our methodology consists of four stages including data collection, preprocessing, predictive model development, and validation. Data are collected from two sources and labeled into three classes including Normal, Hazardous, and Harmful drinkers. Moreover, problems such as imbalanced classes, noise, and categorical variables were handled. A four-layer fully-connected feedforward DNN architecture was designed and developed to predict Normal, Hazardous, and Harmful drinkers. Results show that our proposed method could successfully classify about 96%, 82%, and 89% of Normal, Hazardous, and Harmful drinkers, respectively, which is better than classical machine learning approaches.

Download Full-text

Athlete Health Prediction Using Machine Learning Methods

10.32920/ryerson.14653605 ◽

2021 ◽

Author(s):

MD Raihan Sharif

Keyword(s):

Machine Learning ◽

Model Performance ◽

Imbalanced Data ◽

Well Being ◽

Classification Performance ◽

Sport Organizations ◽

Bootstrap Sampling ◽

Important Research Topic ◽

Imbalanced Classes ◽

Traditional Approaches

Due to an increase in sports activities, the prediction of athletes’ health (AH) has recently become an important research topic. However, it is a challenging task to predict AH because of the nature of the data and the limitations of predictive models. The main objective of this work is to develop appropriate models that can forecast AH using historical data. This work will enable sport organizations to monitor the well-being of their athletes. In this thesis, we explore the applicability of various machine learning (ML) methods for predicting AH. Traditional ML methods do not perform well for class-imbalanced data as these methods are biased towards the majority class. In this work, we propose to use ensemble-based methods which utilize downsampling, bootstrap sampling, and boosting techniques to improve the classification performance. Various metrics are used to evaluate and to compare the model performance. Our results show the superiority of ensemble-based methods over traditional approaches. The random forest and the RUSBoost classier models are in particular found to produce the best performance in handling imbalanced classes.

Download Full-text

imbalanced classes
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Intrusion Detection System on KDD’99 Dataset with Imbalanced Classes

The SAMME.C2 Algorithm for Severely Imbalanced Multi-class Classification

GTF: An Adaptive Network Anomaly Detection Method at the Network Edge

Optimization Based Undersampling for Imbalanced Classes

Matrix sketching for supervised classification with imbalanced classes

A financial statement fraud model based on synthesized attribute selection and a dataset with missing values and imbalanced classes

Machine Learning Screening of COVID-19 Patients Based on X-ray Images for Imbalanced Classes

Improving the Performance of Machine Learning-Based Network Intrusion Detection Systems on the UNSW-NB15 Dataset

Deep Neural Network to Identify Patients with Alcohol Use Disorder

Athlete Health Prediction Using Machine Learning Methods

Export Citation Format

imbalanced classesRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Intrusion Detection System on KDD’99 Dataset with Imbalanced Classes

The SAMME.C2 Algorithm for Severely Imbalanced Multi-class Classification

GTF: An Adaptive Network Anomaly Detection Method at the Network Edge

Optimization Based Undersampling for Imbalanced Classes

Matrix sketching for supervised classification with imbalanced classes

A financial statement fraud model based on synthesized attribute selection and a dataset with missing values and imbalanced classes

Machine Learning Screening of COVID-19 Patients Based on X-ray Images for Imbalanced Classes

Improving the Performance of Machine Learning-Based Network Intrusion Detection Systems on the UNSW-NB15 Dataset

Deep Neural Network to Identify Patients with Alcohol Use Disorder

Athlete Health Prediction Using Machine Learning Methods

imbalanced classes
Recently Published Documents