Image Classifying Based on Cost-Sensitive Layered Cascade Learning

2014 ◽  
Vol 701-702 ◽  
pp. 453-458
Author(s):  
Feng Huang ◽  
Yun Liang ◽  
Li Huang ◽  
Ji Ming Yao ◽  
Wen Feng Tian

Image Classification is an important means of image processing, Traditional research of image classification usually based on following assumptions: aiming for the overall classification accuracy, sample of different category has the same importance in data set and all the misclassification brings same cost. Unfortunately, class imbalance and cost sensitive are ubiquitous in classification in real world process, sample size of specific category in data set may much more than others and misclassification cost is sharp distinction between different categories. High dimension of eigenvector caused by diversity content of images and the big complexity gap between distinguish different categories of images are common problems when dealing with image Classification, therefore, one single machine learning algorithms is not sufficient when dealing with complex image classification contains the above characteristics. To cure the above problems, a layered cascade image classifying method based on cost-sensitive and class-imbalance was proposed, a set of cascading learning was build, and the inner patterns of images of specific category was learned in different stages, also, the cost function was introduced, thus, the method can effectively respond to the cost-sensitive and class-imbalance problem of image classifying. Moreover, the structure of this method is flexible as the layer of cascading and the algorithm in every stage can be readjusted based on business requirements of image classifying. The result of application in sensitive image classifying for smart grid indicates that this image classifying based on cost-sensitive layered cascade learning obtains better image classification performance than the existing methods.

2012 ◽  
Vol 466-467 ◽  
pp. 886-890
Author(s):  
Tian Yu Liu

Defect is one of the important factors resulting in gear fault, so it is significant to study the technology of defect diagnosis for gear. Class imbalance problem is encountered in the fault diagnosis, which causes seriously negative effect on the performance of classifiers that assume a balanced distribution of classes. Though it is critical, few previous works paid attention to this class imbalance problem in the fault diagnosis of gear. In imbalanced problems, some features are redundant and even irrelevant. These features will hurt the generalization performance of learning machines. Here we propose PREE (Prediction Risk based feature selectionfor EasyEnsemble) to solve the class imbalanced problem in the fault diagnosis of gear. Experimental results on UCI data sets and gear data set show that PREE improves the classification performance and prediction ability on the imbalanced dataset.


2016 ◽  
Vol 2016 ◽  
pp. 1-9
Author(s):  
Zhenbing Liu ◽  
Chunyang Gao ◽  
Huihua Yang ◽  
Qijia He

Sparse representation has been successfully used in pattern recognition and machine learning. However, most existing sparse representation based classification (SRC) methods are to achieve the highest classification accuracy, assuming the same losses for different misclassifications. This assumption, however, may not hold in many practical applications as different types of misclassification could lead to different losses. In real-world application, much data sets are imbalanced of the class distribution. To address these problems, we propose a cost-sensitive sparse representation based classification (CSSRC) for class-imbalance problem method by using probabilistic modeling. Unlike traditional SRC methods, we predict the class label of test samples by minimizing the misclassification losses, which are obtained via computing the posterior probabilities. Experimental results on the UCI databases validate the efficacy of the proposed approach on average misclassification cost, positive class misclassification rate, and negative class misclassification rate. In addition, we sampled test samples and training samples with different imbalance ratio and use F-measure, G-mean, classification accuracy, and running time to evaluate the performance of the proposed method. The experiments show that our proposed method performs competitively compared to SRC, CSSVM, and CS4VM.


Energies ◽  
2021 ◽  
Vol 15 (1) ◽  
pp. 212
Author(s):  
Ajit Kumar ◽  
Neetesh Saxena ◽  
Souhwan Jung ◽  
Bong Jun Choi

Critical infrastructures have recently been integrated with digital controls to support intelligent decision making. Although this integration provides various benefits and improvements, it also exposes the system to new cyberattacks. In particular, the injection of false data and commands into communication is one of the most common and fatal cyberattacks in critical infrastructures. Hence, in this paper, we investigate the effectiveness of machine-learning algorithms in detecting False Data Injection Attacks (FDIAs). In particular, we focus on two of the most widely used critical infrastructures, namely power systems and water treatment plants. This study focuses on tackling two key technical issues: (1) finding the set of best features under a different combination of techniques and (2) resolving the class imbalance problem using oversampling methods. We evaluate the performance of each algorithm in terms of time complexity and detection accuracy to meet the time-critical requirements of critical infrastructures. Moreover, we address the inherent skewed distribution problem and the data imbalance problem commonly found in many critical infrastructure datasets. Our results show that the considered minority oversampling techniques can improve the Area Under Curve (AUC) of GradientBoosting, AdaBoost, and kNN by 10–12%.


2021 ◽  
Vol 12 (1) ◽  
pp. 1-17
Author(s):  
Swati V. Narwane ◽  
Sudhir D. Sawarkar

Class imbalance is the major hurdle for machine learning-based systems. Data set is the backbone of machine learning and must be studied to handle the class imbalance. The purpose of this paper is to investigate the effect of class imbalance on the data sets. The proposed methodology determines the model accuracy for class distribution. To find possible solutions, the behaviour of an imbalanced data set was investigated. The study considers two case studies with data set divided balanced to unbalanced class distribution. Testing of the data set with trained and test data was carried out for standard machine learning algorithms. Model accuracy for class distribution was measured with the training data set. Further, the built model was tested with individual binary class. Results show that, for the improvement of the system performance, it is essential to work on class imbalance problems. The study concludes that the system produces biased results due to the majority class. In the future, the multiclass imbalance problem can be studied using advanced algorithms.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Shujuan Wang ◽  
Yuntao Dai ◽  
Jihong Shen ◽  
Jingxue Xuan

AbstractWith the development of artificial intelligence, big data classification technology provides the advantageous help for the medicine auxiliary diagnosis research. While due to the different conditions in the different sample collection, the medical big data is often imbalanced. The class-imbalance problem has been reported as a serious obstacle to the classification performance of many standard learning algorithms. SMOTE algorithm could be used to generate sample points randomly to improve imbalance rate, but its application is affected by the marginalization generation and blindness of parameter selection. Focusing on this problem, an improved SMOTE algorithm based on Normal distribution is proposed in this paper, so that the new sample points are distributed closer to the center of the minority sample with a higher probability to avoid the marginalization of the expanded data. Experiments show that the classification effect is better when use proposed algorithm to expand the imbalanced dataset of Pima, WDBC, WPBC, Ionosphere and Breast-cancer-wisconsin than the original SMOTE algorithm. In addition, the parameter selection of the proposed algorithm is analyzed and it is found that the classification effect is the best when the distribution characteristics of the original data was maintained best by selecting appropriate parameters in our designed experiments.


Author(s):  
D. Duarte ◽  
U. Andriolo ◽  
G. Gonçalves

Abstract. Unmanned Aerial Systems (UAS) has been recently used for mapping marine litter on beach-dune environment. Machine learning algorithms have been applied on UAS-derived images and orthophotos for automated marine litter items detection. As sand and vegetation are much predominant on the orthophoto, marine litter items constitute a small set of data, thus a class much less represented on the image scene. This communication aims to analyse the class imbalance issue on orthophotos for automated marine litter items detection. In the used dataset, the percentage of patches containing marine litter is close to 1% of the total amount of patches, hence representing a clear class imbalance issue. This problem has been previously indicated as detrimental for machine learning frameworks. Three different approaches were tested to address this imbalance, namely class weighting, oversampling and classifier thresholding. Oversampling had the best performance with a f1-score of 0.68, while the other methods had f1-score value of 0.56 on average. The results indicate that future works devoted to UAS-based automated marine litter detection should take in consideration the use of the oversampling method, which helped to improve the results of about 7% in the specific case shown in this paper.


Sign in / Sign up

Export Citation Format

Share Document