A binary PSO-based ensemble under-sampling model for rebalancing imbalanced training data

In this paper, we propose a new method for code-switching (CS) automatic speech recognition (ASR) in Korean. First, the phonetic variations in English pronunciation spoken by Korean speakers should be considered. Thus, we tried to find a unified pronunciation model based on phonetic knowledge and deep learning. Second, we extracted the CS sentences semantically similar to the target domain and then applied the language model (LM) adaptation to solve the biased modeling toward Korean due to the imbalanced training data. In this experiment, training data were AI Hub (1033 h) in Korean and Librispeech (960 h) in English. As a result, when compared to the baseline, the proposed method improved the error reduction rate (ERR) by up to 11.6% with phonetic variant modeling and by 17.3% when semantically similar sentences were applied to the LM adaptation. If we considered only English words, the word correction rate improved up to 24.2% compared to that of the baseline. The proposed method seems to be very effective in CS speech recognition.

Download Full-text

An Investigation of Imbalanced Ensemble Learning Methods for Cross-Project Defect Prediction

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001419590377 ◽

2019 ◽

Vol 33 (12) ◽

pp. 1959037 ◽

Cited By ~ 5

Author(s):

Shaojian Qiu ◽

Lu Lu ◽

Siyu Jiang ◽

Yang Guo

Keyword(s):

Ensemble Learning ◽

Class Imbalance ◽

Training Data ◽

Defect Prediction ◽

Class Imbalance Problem ◽

Learning Methods ◽

Imbalance Problem ◽

Intelligent Software ◽

Under Sampling ◽

Cross Project

Machine-learning-based software defect prediction (SDP) methods are receiving great attention from the researchers of intelligent software engineering. Most existing SDP methods are performed under a within-project setting. However, there usually is little to no within-project training data to learn an available supervised prediction model for a new SDP task. Therefore, cross-project defect prediction (CPDP), which uses labeled data of source projects to learn a defect predictor for a target project, was proposed as a practical SDP solution. In real CPDP tasks, the class imbalance problem is ubiquitous and has a great impact on performance of the CPDP models. Unlike previous studies that focus on subsampling and individual methods, this study investigated 15 imbalanced learning methods for CPDP tasks, especially for assessing the effectiveness of imbalanced ensemble learning (IEL) methods. We evaluated the 15 methods by extensive experiments on 31 open-source projects derived from five datasets. Through analyzing a total of 37504 results, we found that in most cases, the IEL method that combined under-sampling and bagging approaches will be more effective than the other investigated methods.

Download Full-text

Coping with imbalanced training data for improved terrain prediction in autonomous outdoor robot navigation

2010 IEEE International Conference on Robotics and Automation ◽

10.1109/robot.2010.5509634 ◽

2010 ◽

Cited By ~ 4

Author(s):

Michael J Procopio ◽

Jane Mulligan ◽

Greg Grudic

Keyword(s):

Robot Navigation ◽

Training Data ◽

Imbalanced Training Data

Download Full-text

Modeling of Cu-Au prospectivity in the Carajás mineral province (Brazil) through machine learning: Dealing with imbalanced training data

Ore Geology Reviews ◽

10.1016/j.oregeorev.2020.103611 ◽

2020 ◽

Vol 124 ◽

pp. 103611 ◽

Cited By ~ 1

Author(s):

Elias Martins Guerra Prado ◽

Carlos Roberto de Souza Filho ◽

Emmanuel John M. Carranza ◽

João Gabriel Motta

Keyword(s):

Machine Learning ◽

Training Data ◽

Carajás Mineral Province ◽

Imbalanced Training Data

Download Full-text

Fuzzy Asymmetric Support Vector Machines

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.433-440.7479 ◽

2012 ◽

Vol 433-440 ◽

pp. 7479-7486

Author(s):

Rui Kong ◽

Qiong Wang ◽

Gu Yu Hu ◽

Zhi Song Pan

Keyword(s):

Support Vector Machines ◽

Medical Diagnosis ◽

Credit Card ◽

Training Data ◽

Support Vector ◽

Imbalanced Datasets ◽

Credit Card Fraud ◽

Vector Machines ◽

Imbalanced Training Data ◽

Error Costs

Support Vector Machines (SVM) has been extensively studied and has shown remarkable success in many applications. However the success of SVM is very limited when it is applied to the problem of learning from imbalanced datasets in which negative instances heavily outnumber the positive instances (e.g. in medical diagnosis and detecting credit card fraud). In this paper, we propose the fuzzy asymmetric algorithm to augment SVMs to deal with imbalanced training-data problems, called FASVM, which is based on fuzzy memberships, combined with different error costs (DEC) algorithm. We compare the performance of our algorithm against these two algorithms, along with different error costs and regular SVM and show that our algorithm outperforms all of them.

Download Full-text