scholarly journals Robustness of learning techniques in handling class noise in imbalanced datasets

Author(s):  
D. Anyfantis ◽  
M. Karagiannopoulos ◽  
S. Kotsiantis ◽  
P. Pintelas
Author(s):  
Nicksson Ckayo Arrais de Freitas ◽  
Ticiana L. Coelho Da Silva ◽  
José Antônio Fernandes De Macêdo ◽  
Leopoldo Melo Júnioer

Deep learning has gained much popularity in the past years due to GPU advancements, cloud computing improvements, and its supremacy, considering the accuracy results when trained on massive datasets. As with machine learning, deep learning models may experience low performance when handled with imbalanced datasets. In this paper, we focus on the trajectory classification problem, and we examine deep learning techniques for coping with imbalanced class data. We extend a deep learning model, called DeepeST (Deep Learning for Sub-Trajectory classification), to predict the class or label for sub-trajectories from imbalanced datasets. DeepeST is the first deep learning model for trajectory classification that provides approaches for coping with imbalanced dataset problems from the authors' knowledge. In this paper, we perform the experiments with three real datasets from LBSN (Location-Based Social Network) trajectories to identify who is the user of a sub-trajectory (similar to the Trajectory-User Linking problem). We show that DeepeST outperforms other deep learning approaches from state-of-the-art concerning the accuracy, precision, recall, and F1-score.


2021 ◽  
pp. 1-16
Author(s):  
Deepika Singh ◽  
Anju Saha ◽  
Anjana Gosain

Imbalanced dataset classification is challenging because of the severely skewed class distribution. The traditional machine learning algorithms show degraded performance for these skewed datasets. However, there are additional characteristics of a classification dataset that are not only challenging for the traditional machine learning algorithms but also increase the difficulty when constructing a model for imbalanced datasets. Data complexity metrics identify these intrinsic characteristics, which cause substantial deterioration of the learning algorithms’ performance. Though many research efforts have been made to deal with class noise, none of them focused on imbalanced datasets coupled with other intrinsic factors. This paper presents a novel hybrid pre-processing algorithm focusing on treating the class-label noise in the imbalanced dataset, which suffers from other intrinsic factors such as class overlapping, non-linear class boundaries, small disjuncts, and borderline examples. This algorithm uses the wCM complexity metric (proposed for imbalanced dataset) to identify noisy, borderline, and other difficult instances of the dataset and then intelligently handles these instances. Experiments on synthetic datasets and real-world datasets with different levels of imbalance, noise, small disjuncts, class overlapping, and borderline examples are conducted to check the effectiveness of the proposed algorithm. The experimental results show that the proposed algorithm offers an interesting alternative to popular state-of-the-art pre-processing algorithms for effectively handling imbalanced datasets along with noise and other difficulties.


2013 ◽  
Vol 20 (3) ◽  
pp. 327-359 ◽  
Author(s):  
ROSA DEL GAUDIO ◽  
GUSTAVO BATISTA ◽  
ANTÓNIO BRANCO

AbstractThis paper addresses the task of automatic extraction of definitions by thoroughly exploring an approach that solely relies on machine learning techniques, and by focusing on the issue of the imbalance of relevant datasets. We obtained a breakthrough in terms of the automatic extraction of definitions, by extensively and systematically experimenting with different sampling techniques and their combination, as well as a range of different types of classifiers. Performance consistently scored in the range of 0.95–0.99 of area under the receiver operating characteristics, with a notorious improvement between 17 and 22 percentage points regarding the baseline of 0.73–0.77, for datasets with different rates of imbalance. Thus, the present paper also represents a contribution to the seminal work in natural language processing that points toward the importance of exploring the research path of applying sampling techniques to mitigate the bias induced by highly imbalanced datasets, and thus greatly improving the performance of a large range of tools that rely on them.


2015 ◽  
Vol 20 (3) ◽  
pp. 155-166 ◽  
Author(s):  
Larissa J. Maier ◽  
Michael P. Schaub

Abstract. Pharmacological neuroenhancement, defined as the misuse of prescription drugs, illicit drugs, or alcohol for the purpose of enhancing cognition, mood, or prosocial behavior, is not widespread in Europe – nevertheless, it does occur. Thus far, no drug has been proven as safe and effective for cognitive enhancement in otherwise healthy individuals. European studies have investigated the misuse of prescription and illicit stimulants to increase cognitive performance as well as the use of tranquilizers, alcohol, and cannabis to cope with stress related to work or education. Young people in educational settings report pharmacological neuroenhancement more frequently than those in other settings. Although the regular use of drugs for neuroenhancement is not common in Europe, the irregular and low-dose usage of neuroenhancers might cause adverse reactions. Previous studies have revealed that obtaining adequate amounts of sleep and using successful learning techniques effectively improve mental performance, whereas pharmacological neuroenhancement is associated with ambiguous effects. Therefore, non-substance-related alternatives should be promoted to cope with stressful situations. This paper reviews the recent research on pharmacological neuroenhancement in Europe, develops a clear definition of the substances used, and formulates recommendations for practitioners regarding how to react to requests for neuroenhancement drug prescriptions. We conclude that monitoring the future development of pharmacological neuroenhancement in Europe is important to provide effective preventive measures when required. Furthermore, substance use to cope with stress related to work or education should be studied in depth because it is likely more prevalent and dangerous than direct neuroenhancement.


2006 ◽  
Author(s):  
Christopher Schreiner ◽  
Kari Torkkola ◽  
Mike Gardner ◽  
Keshu Zhang

2020 ◽  
Vol 12 (2) ◽  
pp. 84-99
Author(s):  
Li-Pang Chen

In this paper, we investigate analysis and prediction of the time-dependent data. We focus our attention on four different stocks are selected from Yahoo Finance historical database. To build up models and predict the future stock price, we consider three different machine learning techniques including Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN) and Support Vector Regression (SVR). By treating close price, open price, daily low, daily high, adjusted close price, and volume of trades as predictors in machine learning methods, it can be shown that the prediction accuracy is improved.


Diabetes ◽  
2020 ◽  
Vol 69 (Supplement 1) ◽  
pp. 389-P
Author(s):  
SATORU KODAMA ◽  
MAYUKO H. YAMADA ◽  
YUTA YAGUCHI ◽  
MASARU KITAZAWA ◽  
MASANORI KANEKO ◽  
...  

Author(s):  
Anantvir Singh Romana

Accurate diagnostic detection of the disease in a patient is critical and may alter the subsequent treatment and increase the chances of survival rate. Machine learning techniques have been instrumental in disease detection and are currently being used in various classification problems due to their accurate prediction performance. Various techniques may provide different desired accuracies and it is therefore imperative to use the most suitable method which provides the best desired results. This research seeks to provide comparative analysis of Support Vector Machine, Naïve bayes, J48 Decision Tree and neural network classifiers breast cancer and diabetes datsets.


Sign in / Sign up

Export Citation Format

Share Document