A binary PSO-based ensemble under-sampling model for rebalancing imbalanced training data

Author(s):  
Jinyan Li ◽  
Yaoyang Wu ◽  
Simon Fong ◽  
Antonio J. Tallón-Ballesteros ◽  
Xin-she Yang ◽  
...  
2021 ◽  
Vol 11 (6) ◽  
pp. 2866
Author(s):  
Damheo Lee ◽  
Donghyun Kim ◽  
Seung Yun ◽  
Sanghun Kim

In this paper, we propose a new method for code-switching (CS) automatic speech recognition (ASR) in Korean. First, the phonetic variations in English pronunciation spoken by Korean speakers should be considered. Thus, we tried to find a unified pronunciation model based on phonetic knowledge and deep learning. Second, we extracted the CS sentences semantically similar to the target domain and then applied the language model (LM) adaptation to solve the biased modeling toward Korean due to the imbalanced training data. In this experiment, training data were AI Hub (1033 h) in Korean and Librispeech (960 h) in English. As a result, when compared to the baseline, the proposed method improved the error reduction rate (ERR) by up to 11.6% with phonetic variant modeling and by 17.3% when semantically similar sentences were applied to the LM adaptation. If we considered only English words, the word correction rate improved up to 24.2% compared to that of the baseline. The proposed method seems to be very effective in CS speech recognition.


Author(s):  
Shaojian Qiu ◽  
Lu Lu ◽  
Siyu Jiang ◽  
Yang Guo

Machine-learning-based software defect prediction (SDP) methods are receiving great attention from the researchers of intelligent software engineering. Most existing SDP methods are performed under a within-project setting. However, there usually is little to no within-project training data to learn an available supervised prediction model for a new SDP task. Therefore, cross-project defect prediction (CPDP), which uses labeled data of source projects to learn a defect predictor for a target project, was proposed as a practical SDP solution. In real CPDP tasks, the class imbalance problem is ubiquitous and has a great impact on performance of the CPDP models. Unlike previous studies that focus on subsampling and individual methods, this study investigated 15 imbalanced learning methods for CPDP tasks, especially for assessing the effectiveness of imbalanced ensemble learning (IEL) methods. We evaluated the 15 methods by extensive experiments on 31 open-source projects derived from five datasets. Through analyzing a total of 37504 results, we found that in most cases, the IEL method that combined under-sampling and bagging approaches will be more effective than the other investigated methods.


2020 ◽  
Vol 124 ◽  
pp. 103611 ◽  
Author(s):  
Elias Martins Guerra Prado ◽  
Carlos Roberto de Souza Filho ◽  
Emmanuel John M. Carranza ◽  
João Gabriel Motta

2012 ◽  
Vol 433-440 ◽  
pp. 7479-7486
Author(s):  
Rui Kong ◽  
Qiong Wang ◽  
Gu Yu Hu ◽  
Zhi Song Pan

Support Vector Machines (SVM) has been extensively studied and has shown remarkable success in many applications. However the success of SVM is very limited when it is applied to the problem of learning from imbalanced datasets in which negative instances heavily outnumber the positive instances (e.g. in medical diagnosis and detecting credit card fraud). In this paper, we propose the fuzzy asymmetric algorithm to augment SVMs to deal with imbalanced training-data problems, called FASVM, which is based on fuzzy memberships, combined with different error costs (DEC) algorithm. We compare the performance of our algorithm against these two algorithms, along with different error costs and regular SVM and show that our algorithm outperforms all of them.


Author(s):  
Amir Laadhar ◽  
Faiza Ghozzi ◽  
Imen Megdiche ◽  
Franck Ravat ◽  
Olivier Teste ◽  
...  

Author(s):  
Yoshihiko Kawai ◽  
Hideki Sumiyoshi ◽  
Mahito Fujii ◽  
Masahiro Shibata ◽  
Noboru Babaguchi

Sign in / Sign up

Export Citation Format

Share Document