imbalanced classes
Recently Published Documents


TOTAL DOCUMENTS

77
(FIVE YEARS 44)

H-INDEX

6
(FIVE YEARS 3)

Author(s):  
Anupam Agrawal ◽  

The paper describes a method of intrusion detection that keeps check of it with help of machine learning algorithms. The experiments have been conducted over KDD’99 cup dataset, which is an imbalanced dataset, cause of which recall of some classes coming drastically low as there were not enough instances of it in there. For Preprocessing of dataset One Hot Encoding and Label Encoding to make it machine readable. The dimensionality of dataset has been reduced using Principal Component Analysis and classification of dataset into classes viz. attack and normal is done by Naïve Bayes Classifier. Due to imbalanced nature, shift of focus was on recall and overall recall and compared with other models which have achieved great accuracy. Based on the results, using a self optimizing loop, model has achieved better geometric mean accuracy.


Author(s):  
Banghee So ◽  
Emiliano A. Valdez

Classification predictive modeling involves the accurate assignment of observations in a dataset to target classes or categories. There is an increasing growth of real-world classification problems with severely imbalanced class distributions. In this case, minority classes have much fewer observations to learn from than those from majority classes. Despite this sparsity, a minority class is often considered the more interesting class yet developing a scientific learning algorithm suitable for the observations presents countless challenges. In this article, we suggest a novel multi-class classification algorithm specialized to handle severely imbalanced classes based on the method we refer to as SAMME.C2. It blends the flexible mechanics of the boosting techniques from SAMME algorithm, a multi-class classifier, and Ada.C2 algorithm, a cost-sensitive binary classifier designed to address highly class imbalances. Not only do we provide the resulting algorithm but we also establish scientific and statistical formulation of our proposed SAMME.C2 algorithm. Through numerical experiments examining various degrees of classifier difficulty, we demonstrate consistent superior performance of our proposed model.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Renjie Li ◽  
Zhou Zhou ◽  
Xuan Liu ◽  
Da Li ◽  
Wei Yang ◽  
...  

Network Anomaly Detection (NAD) has become the foundation for network management and security due to the rapid development and adoption of edge computing technologies. There are two main characteristics of NAD tasks: tabular input data and imbalanced classes. Tabular input data format means NAD tasks take both sparse categorical features and dense numerical features as input. In order to achieve good performance, the detection model needs to handle both types of features efficiently. Among all widely used models, Gradient Boosting Decision Tree (GBDT) and Neural Network (NN) are the two most popular ones. However, each method has its limitation: GBDT is inefficient when dealing with sparse categorical features, while NN cannot yield satisfactory performance for dense numerical features. Imbalanced classes may downgrade the classifier’s performance and cause biased results towards the majority classes, often neglected by many exiting NAD studies. Most of the existing solutions addressing imbalance suffer from poor performance, high computational consumption, or loss of vital information under such a scenario. In this paper, we propose an adaptive ensemble-based method, named GTF, which combines TabTransformer and GBDT to leverage categorical and numerical features effectively and introduces Focal Loss to mitigate the imbalance classification. Our comprehensive experiments on two public datasets demonstrate that GTF can outperform other well-known methods in both multiclass and binary cases. Our implementation also shows that GTF has limited complexity, making it be a good candidate for deployment at the network edge.


Author(s):  
Roberta Falcone ◽  
Laura Anderlucci ◽  
Angela Montanari

AbstractThe presence of imbalanced classes is more and more common in practical applications and it is known to heavily compromise the learning process. In this paper we propose a new method aimed at addressing this issue in binary supervised classification. Re-balancing the class sizes has turned out to be a fruitful strategy to overcome this problem. Our proposal performs re-balancing through matrix sketching. Matrix sketching is a recently developed data compression technique that is characterized by the property of preserving most of the linear information that is present in the data. Such property is guaranteed by the Johnson-Lindenstrauss’ Lemma (1984) and allows to embed an n-dimensional space into a reduced one without distorting, within an $$\epsilon $$ ϵ -size interval, the distances between any pair of points. We propose to use matrix sketching as an alternative to the standard re-balancing strategies that are based on random under-sampling the majority class or random over-sampling the minority one. We assess the properties of our method when combined with linear discriminant analysis (LDA), classification trees (C4.5) and Support Vector Machines (SVM) on simulated and real data. Results show that sketching can represent a sound alternative to the most widely used rebalancing methods.


2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Soulaiman Moualla ◽  
Khaldoun Khorzom ◽  
Assef Jafar

Networks are exposed to an increasing number of cyberattacks due to their vulnerabilities. So, cybersecurity strives to make networks as safe as possible, by introducing defense systems to detect any suspicious activities. However, firewalls and classical intrusion detection systems (IDSs) suffer from continuous updating of their defined databases to detect threats. The new directions of the IDSs aim to leverage the machine learning models to design more robust systems with higher detection rates and lower false alarm rates. This research presents a novel network IDS, which plays an important role in network security and faces the current cyberattacks on networks using the UNSW-NB15 dataset benchmark. Our proposed system is a dynamically scalable multiclass machine learning-based network IDS. It consists of several stages based on supervised machine learning. It starts with the Synthetic Minority Oversampling Technique (SMOTE) method to solve the imbalanced classes problem in the dataset and then selects the important features for each class existing in the dataset by the Gini Impurity criterion using the Extremely Randomized Trees Classifier (Extra Trees Classifier). After that, a pretrained extreme learning machine (ELM) model is responsible for detecting the attacks separately, “One-Versus-All” as a binary classifier for each of them. Finally, the ELM classifier outputs become the inputs to a fully connected layer in order to learn from all their combinations, followed by a logistic regression layer to make soft decisions for all classes. Results show that our proposed system performs better than related works in terms of accuracy, false alarm rate, Receiver Operating Characteristic (ROC), and Precision-Recall Curves (PRCs).


Author(s):  
Ali Ebrahimi ◽  
Uffe Kock Wiil ◽  
Marjan Mansourvar ◽  
Amin Naemi ◽  
Kjeld Andersen ◽  
...  

This paper presents an application of deep neural networks (DNN) to identify patients with Alcohol Use Disorder based on historical electronic health records. Our methodology consists of four stages including data collection, preprocessing, predictive model development, and validation. Data are collected from two sources and labeled into three classes including Normal, Hazardous, and Harmful drinkers. Moreover, problems such as imbalanced classes, noise, and categorical variables were handled. A four-layer fully-connected feedforward DNN architecture was designed and developed to predict Normal, Hazardous, and Harmful drinkers. Results show that our proposed method could successfully classify about 96%, 82%, and 89% of Normal, Hazardous, and Harmful drinkers, respectively, which is better than classical machine learning approaches.


2021 ◽  
Author(s):  
MD Raihan Sharif

Due to an increase in sports activities, the prediction of athletes’ health (AH) has recently become an important research topic. However, it is a challenging task to predict AH because of the nature of the data and the limitations of predictive models. The main objective of this work is to develop appropriate models that can forecast AH using historical data. This work will enable sport organizations to monitor the well-being of their athletes. In this thesis, we explore the applicability of various machine learning (ML) methods for predicting AH. Traditional ML methods do not perform well for class-imbalanced data as these methods are biased towards the majority class. In this work, we propose to use ensemble-based methods which utilize downsampling, bootstrap sampling, and boosting techniques to improve the classification performance. Various metrics are used to evaluate and to compare the model performance. Our results show the superiority of ensemble-based methods over traditional approaches. The random forest and the RUSBoost classier models are in particular found to produce the best performance in handling imbalanced classes.


Sign in / Sign up

Export Citation Format

Share Document