minority class
Recently Published Documents


TOTAL DOCUMENTS

247
(FIVE YEARS 172)

H-INDEX

15
(FIVE YEARS 5)

Sensors ◽  
2022 ◽  
Vol 22 (2) ◽  
pp. 644
Author(s):  
Hanqing Wang ◽  
Xiaoyuan Wang ◽  
Junyan Han ◽  
Hui Xiang ◽  
Hao Li ◽  
...  

Aggressive driving behavior (ADB) is one of the main causes of traffic accidents. The accurate recognition of ADB is the premise to timely and effectively conduct warning or intervention to the driver. There are some disadvantages, such as high miss rate and low accuracy, in the previous data-driven recognition methods of ADB, which are caused by the problems such as the improper processing of the dataset with imbalanced class distribution and one single classifier utilized. Aiming to deal with these disadvantages, an ensemble learning-based recognition method of ADB is proposed in this paper. First, the majority class in the dataset is grouped employing the self-organizing map (SOM) and then are combined with the minority class to construct multiple class balance datasets. Second, three deep learning methods, including convolutional neural networks (CNN), long short-term memory (LSTM), and gated recurrent unit (GRU), are employed to build the base classifiers for the class balance datasets. Finally, the ensemble classifiers are combined by the base classifiers according to 10 different rules, and then trained and verified using a multi-source naturalistic driving dataset acquired by the integrated experiment vehicle. The results suggest that in terms of the recognition of ADB, the ensemble learning method proposed in this research achieves better performance in accuracy, recall, and F1-score than the aforementioned typical deep learning methods. Among the ensemble classifiers, the one based on the LSTM and the Product Rule has the optimal performance, and the other one based on the LSTM and the Sum Rule has the suboptimal performance.


2022 ◽  
Author(s):  
Seunghwan Park ◽  
Hae-Wwan Lee ◽  
Jongho Im

<div>We consider the binary classification of imbalanced data. A dataset is imbalanced if the proportion of classes are heavily skewed. Imbalanced data classification is often challengeable, especially for high-dimensional data, because unequal classes deteriorate classifier performance. Under sampling the majority class or oversampling the minority class are popular methods to construct balanced samples, facilitating classification performance improvement. However, many existing sampling methods cannot be easily extended to high-dimensional data and mixed data, including categorical variables, because they often require approximating the attribute distributions, which becomes another critical issue. In this paper, we propose a new sampling strategy employing raking and relabeling procedures, such that the attribute values of the majority class are imputed for the values of the minority class in the construction of balanced samples. The proposed algorithms produce comparable performance as existing popular methods but are more flexible regarding the data shape and attribute size. The sampling algorithm is attractive in practice, considering that it does not require density estimation for synthetic data generation in oversampling and is not bothered by mixed-type variables. In addition, the proposed sampling strategy is robust to classifiers in the sense that classification performance is not sensitive to choosing the classifiers.</div>


2022 ◽  
Author(s):  
Seunghwan Park ◽  
Hae-Wwan Lee ◽  
Jongho Im

<div>We consider the binary classification of imbalanced data. A dataset is imbalanced if the proportion of classes are heavily skewed. Imbalanced data classification is often challengeable, especially for high-dimensional data, because unequal classes deteriorate classifier performance. Under sampling the majority class or oversampling the minority class are popular methods to construct balanced samples, facilitating classification performance improvement. However, many existing sampling methods cannot be easily extended to high-dimensional data and mixed data, including categorical variables, because they often require approximating the attribute distributions, which becomes another critical issue. In this paper, we propose a new sampling strategy employing raking and relabeling procedures, such that the attribute values of the majority class are imputed for the values of the minority class in the construction of balanced samples. The proposed algorithms produce comparable performance as existing popular methods but are more flexible regarding the data shape and attribute size. The sampling algorithm is attractive in practice, considering that it does not require density estimation for synthetic data generation in oversampling and is not bothered by mixed-type variables. In addition, the proposed sampling strategy is robust to classifiers in the sense that classification performance is not sensitive to choosing the classifiers.</div>


2022 ◽  
Author(s):  
Bens Pardamean ◽  
Arif Budiarto ◽  
Bharuno Mahesworo ◽  
Alam Ahmad Hidayat ◽  
Digdo Sudigyo

Abstract Background: Sleep is commonly associated with physical and mental health status. Sleep quality can be determined from the dynamic of sleep stages during the night. Data from the wearable device can potentially be used as predictors to classify the sleep stage. Robust Machine Learning (ML) model is needed to learn the pattern within wearable data to be associated with the sleep-wake classification, especially to handle the imbalanced proportion between wake and sleep stages. In this study, we incorporated a publicy available dataset consists of three features captured from a consumer wearable device and the labelled sleep stages from a polysomnogram. We implemented Random Forest, Support Vector Machine , Extreme Gradiet Boosting Tree, Densed Neural Network (DNN), and Long Short-Term Memory (LSTM), complemented by three strategies to handle the imbalanced data problem. Results: In total, we included more than 24,815 rows of preprocessed data from 31 samples. The proportion of minority-majority data is 1:10. In classifying this extreme imbalanced data, the DNN model was found to have the best performance compared to the previous best model, which is based on basic Multi-Layer Perceptron. Our best model successfully achieved a 12% higher specificity score (prediction score for minority class) and 1% improvement on the sensitivity score (prediction score for majority class) by including all features in the model. This achievement was affected by the implementation of custom class weight and oversampling strategy. In contrast, when we only used two features, XGB achieved a specificity improvement only by 1%, while keeping the sensitivity at the same level.Conclusions: The non-linear operation within the DNN model could successfully learn the hidden pattern from the combination of three features. Additionally, the class weight parameter avoided the model ignoring the minority class by giving more weight for this class in the loss function. The feature engineering process seemed to obscure the time-series characteristics within the data. This is why LSTM, as one of the best methods for time-series data, failed to perform well in this classification task.


Author(s):  
Mohammad Zoynul Abedin ◽  
Chi Guotai ◽  
Petr Hajek ◽  
Tong Zhang

AbstractIn small business credit risk assessment, the default and nondefault classes are highly imbalanced. To overcome this problem, this study proposes an extended ensemble approach rooted in the weighted synthetic minority oversampling technique (WSMOTE), which is called WSMOTE-ensemble. The proposed ensemble classifier hybridizes WSMOTE and Bagging with sampling composite mixtures to guarantee the robustness and variability of the generated synthetic instances and, thus, minimize the small business class-skewed constraints linked to default and nondefault instances. The original small business dataset used in this study was taken from 3111 records from a Chinese commercial bank. By implementing a thorough experimental study of extensively skewed data-modeling scenarios, a multilevel experimental setting was established for a rare event domain. Based on the proper evaluation measures, this study proposes that the random forest classifier used in the WSMOTE-ensemble model provides a good trade-off between the performance on default class and that of nondefault class. The ensemble solution improved the accuracy of the minority class by 15.16% in comparison with its competitors. This study also shows that sampling methods outperform nonsampling algorithms. With these contributions, this study fills a noteworthy knowledge gap and adds several unique insights regarding the prediction of small business credit risk.


2022 ◽  
pp. 683-702
Author(s):  
Ramazan Ünlü

Manual detection of abnormality in control data is an annoying work which requires a specialized person. Automatic detection might be simpler and effective. Various methodologies such as ANN, SVM, Fuzzy Logic, etc. have been implemented into the control chart patterns to detect abnormal patterns in real time. In general, control chart data is imbalanced, meaning the rate of minority class (abnormal pattern) is much lower than the rate of normal class (normal pattern). To take this fact into consideration, authors implemented a weighting strategy in conjunction with ANN and investigated the performance of weighted ANN for several abnormal patterns, then compared its performance with regular ANN. This comparison is also made under different conditions, for example, abnormal and normal patterns are separable, partially separable, inseparable and the length of data is fixed as being 10,20, and 30 for each. Based on numerical results, weighting policy can better predict in some of the cases in terms of classifying samples belonging to minority class to the correct class.


Author(s):  
Banghee So ◽  
Emiliano A. Valdez

Classification predictive modeling involves the accurate assignment of observations in a dataset to target classes or categories. There is an increasing growth of real-world classification problems with severely imbalanced class distributions. In this case, minority classes have much fewer observations to learn from than those from majority classes. Despite this sparsity, a minority class is often considered the more interesting class yet developing a scientific learning algorithm suitable for the observations presents countless challenges. In this article, we suggest a novel multi-class classification algorithm specialized to handle severely imbalanced classes based on the method we refer to as SAMME.C2. It blends the flexible mechanics of the boosting techniques from SAMME algorithm, a multi-class classifier, and Ada.C2 algorithm, a cost-sensitive binary classifier designed to address highly class imbalances. Not only do we provide the resulting algorithm but we also establish scientific and statistical formulation of our proposed SAMME.C2 algorithm. Through numerical experiments examining various degrees of classifier difficulty, we demonstrate consistent superior performance of our proposed model.


Electronics ◽  
2021 ◽  
Vol 10 (24) ◽  
pp. 3124
Author(s):  
Jun Guan ◽  
Xu Jiang ◽  
Baolei Mao

More and more Android application developers are adopting many different methods against reverse engineering, such as adding a shell, resulting in certain features that cannot be obtained through decompilation, which causes a serious sample imbalance in Android malware detection based on machine learning. Hence, the researchers have focused on how to solve class-imbalance to improve the performance of Android malware detection. However, the disadvantages of the existing class-imbalance learning are mainly the loss of valuable samples and the computational cost. In this paper, we propose a method of Class-Imbalance Learning (CIL), which first selects representative features, uses the clustering K-Means algorithm and under-sampling to retain the important samples of the majority class while reducing the number of samples of the majority class. After that, we use the Synthetic Minority Over-Sampling Technique (SMOTE) algorithm to generate minority class samples for data balance, and finally use the Random Forest (RF) algorithm to build a malware detection model. The result of experiments indicates that CIL effectively improves the performance of Android malware detection based on machine learning, especially for class imbalance. Compared with existing class-imbalance learning methods, CIL is also effective for the Machine Learning Repository from the University of California, Irvine (UCI) and has better performance in some data sets.


Sign in / Sign up

Export Citation Format

Share Document