Undersampling Near Decision Boundary for Imbalance Problems

Author(s):  
Jianjun Zhang ◽  
Ting Wang ◽  
Wing W. Y. Ng ◽  
Shuai Zhang ◽  
Chris D. Nugent
Keyword(s):  
2020 ◽  
pp. 147592172097970
Author(s):  
Liangliang Cheng ◽  
Vahid Yaghoubi ◽  
Wim Van Paepegem ◽  
Mathias Kersemans

The Mahalanobis–Taguchi system is considered as a promising and powerful tool for handling binary classification cases. Though, the Mahalanobis–Taguchi system has several restrictions in screening useful features and determining the decision boundary in an optimal manner. In this article, an integrated Mahalanobis classification system is proposed which builds on the concept of Mahalanobis distance and its space. The integrated Mahalanobis classification system integrates the decision boundary searching process, based on particle swarm optimizer, directly into the feature selection phase for constructing the Mahalanobis distance space. This integration (a) avoids the need for user-dependent input parameters and (b) improves the classification performance. For the feature selection phase, both the use of binary particle swarm optimizer and binary gravitational search algorithm is investigated. To deal with possible overfitting problems in case of sparse data sets, k-fold cross-validation is considered. The integrated Mahalanobis classification system procedure is benchmarked with the classical Mahalanobis–Taguchi system as well as the recently proposed two-stage Mahalanobis classification system in terms of classification performance. Results are presented on both an experimental case study of complex-shaped metallic turbine blades with various damage types and a synthetic case study of cylindrical dogbone samples with creep and microstructural damage. The results indicate that the proposed integrated Mahalanobis classification system shows good and robust classification performance.


2018 ◽  
Vol 30 (12) ◽  
pp. 3151-3167 ◽  
Author(s):  
Dmitry Krotov ◽  
John Hopfield

Deep neural networks (DNNs) trained in a supervised way suffer from two known problems. First, the minima of the objective function used in learning correspond to data points (also known as rubbish examples or fooling images) that lack semantic similarity with the training data. Second, a clean input can be changed by a small, and often imperceptible for human vision, perturbation so that the resulting deformed input is misclassified by the network. These findings emphasize the differences between the ways DNNs and humans classify patterns and raise a question of designing learning algorithms that more accurately mimic human perception compared to the existing methods. Our article examines these questions within the framework of dense associative memory (DAM) models. These models are defined by the energy function, with higher-order (higher than quadratic) interactions between the neurons. We show that in the limit when the power of the interaction vertex in the energy function is sufficiently large, these models have the following three properties. First, the minima of the objective function are free from rubbish images, so that each minimum is a semantically meaningful pattern. Second, artificial patterns poised precisely at the decision boundary look ambiguous to human subjects and share aspects of both classes that are separated by that decision boundary. Third, adversarial images constructed by models with small power of the interaction vertex, which are equivalent to DNN with rectified linear units, fail to transfer to and fool the models with higher-order interactions. This opens up the possibility of using higher-order models for detecting and stopping malicious adversarial attacks. The results we present suggest that DAMs with higher-order energy functions are more robust to adversarial and rubbish inputs than DNNs with rectified linear units.


Risks ◽  
2020 ◽  
Vol 8 (2) ◽  
pp. 52
Author(s):  
Santosh Kumar Shrivastav ◽  
P. Janaki Ramudu

Banks play a vital role in strengthening the financial system of a country; hence, their survival is decisive for the stability of national economies. Therefore, analyzing the survival probability of the banks is an essential and continuing research activity. However, the current literature available indicates that research is currently limited on banks’ stress quantification in countries like India where there have been fewer failed banks. The literature also indicates a lack of scientific and quantitative approaches that can be used to predict bank survival and failure probabilities. Against this backdrop, the present study attempts to establish a bankruptcy prediction model using a machine learning approach and to compute and compare the financial stress that the banks face. The study uses the data of failed and surviving private and public sector banks in India for the period January 2000 through December 2017. The explanatory features of bank failure are chosen by using a two-step feature selection technique. First, a relief algorithm is used for primary screening of useful features, and in the second step, important features are fed into the support vector machine to create a forecasting model. The threshold values of the features for the decision boundary which separates failed banks from survival banks are calculated using the decision boundary of the support vector machine with a linear kernel. The results reveal, inter alia, that support vector machine with linear kernel shows 92.86% forecasting accuracy, while a support vector machine with radial basis function kernel shows 71.43% accuracy. The study helps to carry out comparative analyses of financial stress of the banks and has significant implications for their decisions of various stakeholders such as shareholders, management of the banks, analysts, and policymakers.


Sign in / Sign up

Export Citation Format

Share Document