A new instance density-based synthetic minority oversampling method for imbalanced classification problems

2021 ◽  
pp. 1-15
Author(s):  
Chung-Kang Ma ◽  
You-Jin Park
2020 ◽  
Vol 203 ◽  
pp. 106116
Author(s):  
Jianan Wei ◽  
Haisong Huang ◽  
Liguo Yao ◽  
Yao Hu ◽  
Qingsong Fan ◽  
...  

2011 ◽  
Vol 8 (4) ◽  
pp. 199-211
Author(s):  
Hernán Ahumada ◽  
Guillermo L. Grinblat ◽  
Lucas C. Uzal ◽  
Alejandro Ceccatto ◽  
Pablo M. Granitto

2016 ◽  
Vol 6 (3) ◽  
pp. 173-188 ◽  
Author(s):  
Vladimir Stanovov ◽  
Eugene Semenkin ◽  
Olga Semenkina

Abstract A novel approach for instance selection in classification problems is presented. This adaptive instance selection is designed to simultaneously decrease the amount of computation resources required and increase the classification quality achieved. The approach generates new training samples during the evolutionary process and changes the training set for the algorithm. The instance selection is guided by means of changing probabilities, so that the algorithm concentrates on problematic examples which are difficult to classify. The hybrid fuzzy classification algorithm with a self-configuration procedure is used as a problem solver. The classification quality is tested upon 9 problem data sets from the KEEL repository. A special balancing strategy is used in the instance selection approach to improve the classification quality on imbalanced datasets. The results prove the usefulness of the proposed approach as compared with other classification methods.


2008 ◽  
Author(s):  
Hernán Ahumada ◽  
Guillermo L. Grinblat ◽  
Lucas C. Uzal ◽  
Pablo M. Granitto ◽  
Alejandro Ceccatto

2018 ◽  
Author(s):  
Sebastian Bittrich ◽  
Marika Kaden ◽  
Christoph Leberecht ◽  
Florian Kaiser ◽  
Thomas Villmann ◽  
...  

AbstractBackgroundMachine learning strategies are prominent tools for data analysis. Especially in life sciences, they have become increasingly important to handle the growing datasets collected by the scientific community. Meanwhile, algorithms improve in performance, but also gain complexity, and tend to neglect interpretability and comprehensiveness of the resulting models.ResultsGeneralized Matrix Learning Vector Quantization (GMLVQ) is a supervised, prototype-based machine learning method and provides comprehensive visualization capabilities not present in other classifiers which allow for a fine-grained interpretation of the data. In contrast to commonly used machine learning strategies, GMLVQ is well-suited for imbalanced classification problems which are frequent in life sciences. We present a Weka plug-in implementing GMLVQ. The feasibility of GMLVQ is demonstrated on a dataset of Early Folding Residues (EFR) that have been shown to initiate and guide the protein folding process. Using 27 features, an area under the receiver operating characteristic of 76.6% was achieved which is comparable to other state-of-the-art classifiers.ConclusionsThe application on EFR prediction demonstrates how an easy interpretation of classification models can promote the comprehension of biological mechanisms. The results shed light on the special features of EFR which were reported as most influential for the classification: EFR are embedded in ordered secondary structure elements and they participate in networks of hydrophobic residues. Visualization capabilities of GMLVQ are presented as we demonstrate how to interpret the results.


Sign in / Sign up

Export Citation Format

Share Document