misclassification cost
Recently Published Documents


TOTAL DOCUMENTS

63
(FIVE YEARS 24)

H-INDEX

8
(FIVE YEARS 2)

2021 ◽  
Vol 9 ◽  
Author(s):  
Mingzhu Tang ◽  
Qi Zhao ◽  
Huawei Wu ◽  
Zimin Wang

In practice, faulty samples of wind turbine (WT) gearboxes are far smaller than normal samples during operation, and most of the existing fault diagnosis methods for WT gearboxes only focus on the improvement of classification accuracy and ignore the decrease of missed alarms and the reduction of the average cost. To this end, a new framework is proposed through combining the Spearman rank correlation feature extraction and cost-sensitive LightGBM algorithm for WT gearbox’s fault detection. In this article, features from wind turbine supervisory control and data acquisition (SCADA) systems are firstly extracted. Then, the feature selection is employed by using the expert experience and Spearman rank correlation coefficient to analyze the correlation between the big data of WT gearboxes. Moreover, the cost-sensitive LightGBM fault detection framework is established by optimizing the misclassification cost. The false alarm rate and the missed detection rate of the WT gearbox under different working conditions are finally obtained. Experiments have verified that the proposed method can significantly improve the fault detection accuracy. Meanwhile, the proposed method can consistently outperform traditional classifiers such as AdaCost, cost-sensitive GBDT, and cost-sensitive XGBoost in terms of low false alarm rate and missed detection rate. Owing to its high Matthews correlation coefficient scores and low average misclassification cost, the cost-sensitive LightGBM (CS LightGBM) method is preferred for imbalanced WT gearbox fault detection in practice.


2021 ◽  
Vol 24 (67) ◽  
pp. 147-156
Author(s):  
Amin Rezaeipanah ◽  
Neda Boroumand

Nowadays, breast cancer is one of the leading causes of death women in the worldwide. If breast cancer is detected at the beginning stage, it can ensure long-term survival. Numerous methods have been proposed for the early prediction of this cancer, however, efforts are still ongoing given the importance of the problem. Artificial Neural Networks (ANN) have been established as some of the most dominant machine learning algorithms, where they are very popular for prediction and classification work. In this paper, an Intelligent Ensemble Classification method based on Multi-Layer Perceptron neural network (IEC-MLP) is proposed for breast cancer diagnosis. The proposed method is split into two stages, parameters optimization and ensemble classification. In the first stage, the MLP Neural Network (MLP-NN) parameters, including optimal features, hidden layers, hidden nodes and weights, are optimized with an Evolutionary Algorithm (EA) for maximize the classification accuracy. In the second stage, an ensemble classification algorithm of MLP-NN is applied to classify the patient with optimized parameters. Our proposed IEC-MLP method which can not only help to reduce the complexity of MLP-NN and effectively selection the optimal feature subset, but it can also obtain the minimum misclassification cost. The classification results were evaluated using the IEC-MLP for different breast cancer datasets and the prediction results obtained were very promising (98.74% accuracy on the WBCD dataset). Meanwhile, the proposed method outperforms the GAANN and CAFS algorithms and other state-of-the-art classifiers. In addition, IEC-MLP could also be applied to other cancer diagnosis.


Processes ◽  
2021 ◽  
Vol 9 (7) ◽  
pp. 1107
Author(s):  
Krzysztof Gajowniczek ◽  
Tomasz Ząbkowski

This paper presents two new R packages ImbTreeEntropy and ImbTreeAUC for building decision trees, including their interactive construction and analysis, which is a highly regarded feature for field experts who want to be involved in the learning process. ImbTreeEntropy functionality includes the application of generalized entropy functions, such as Renyi, Tsallis, Sharma-Mittal, Sharma-Taneja and Kapur, to measure the impurity of a node. ImbTreeAUC provides non-standard measures to choose an optimal split point for an attribute (as well the optimal attribute for splitting) by employing local, semi-global and global AUC measures. The contribution of both packages is that thanks to interactive learning, the user is able to construct a new tree from scratch or, if required, the learning phase enables making a decision regarding the optimal split in ambiguous situations, taking into account each attribute and its cut-off. The main difference with existing solutions is that our packages provide mechanisms that allow for analyzing the trees’ structures (several trees simultaneously) that are built after growing and/or pruning. Both packages support cost-sensitive learning by defining a misclassification cost matrix, as well as weight-sensitive learning. Additionally, the tree structure of the model can be represented as a rule-based model, along with the various quality measures, such as support, confidence, lift, conviction, addedValue, cosine, Jaccard and Laplace.


2021 ◽  
Vol 9 ◽  
Author(s):  
Mingzhu Tang ◽  
Yutao Chen ◽  
Huawei Wu ◽  
Qi Zhao ◽  
Wen Long ◽  
...  

The number of normal samples of wind turbine generators is much larger than the number of fault samples. To solve the problem of imbalanced classification in wind turbine generator fault detection, a cost-sensitive extremely randomized trees (CS-ERT) algorithm is proposed in this paper, in which the cost-sensitive learning method is introduced into an extremely randomized trees (ERT) algorithm. Based on the classification misclassification cost and class distribution, the misclassification cost gain (MCG) is proposed as the score measure of the CS-ERT model growth process to improve the classification accuracy of minority classes. The Hilbert-Schmidt independence criterion lasso (HSICLasso) feature selection method is used to select strongly correlated non-redundant features of doubly-fed wind turbine generators. The effectiveness of the method was verified by experiments on four different failure datasets of wind turbine generators. The experiment results show that average missing detection rate, average misclassification cost and gMean of the improved algorithm better than those of the ERT algorithm. In addition, compared with the CSForest, AdaCost and MetaCost methods, the proposed method has better real-time fault detection performance.


2021 ◽  
pp. 016555152199980
Author(s):  
Yuanyuan Lin ◽  
Chao Huang ◽  
Wei Yao ◽  
Yifei Shao

Attraction recommendation plays an important role in tourism, such as solving information overload problems and recommending proper attractions to users. Currently, most recommendation methods are dedicated to improving the accuracy of recommendations. However, recommendation methods only focusing on accuracy tend to recommend popular items that are often purchased by users, which results in a lack of diversity and low visibility of non-popular items. Hence, many studies have suggested the importance of recommendation diversity and proposed improved methods, but there is room for improvement. First, the definition of diversity for different items requires consideration for domain characteristics. Second, the existing algorithms for improving diversity sacrifice the accuracy of recommendations. Therefore, the article utilises the topic ‘features of attractions’ to define the calculation method of recommendation diversity. We developed a two-stage optimisation model to enhance recommendation diversity while maintaining the accuracy of recommendations. In the first stage, an optimisation model considering topic diversity is proposed to increase recommendation diversity and generate candidate attractions. In the second stage, we propose a minimisation misclassification cost optimisation model to balance recommendation diversity and accuracy. To assess the performance of the proposed method, experiments are conducted with real-world travel data. The results indicate that the proposed two-stage optimisation model can significantly improve the diversity and accuracy of recommendations.


Electronics ◽  
2021 ◽  
Vol 10 (6) ◽  
pp. 657
Author(s):  
Krzysztof Gajowniczek ◽  
Tomasz Ząbkowski

This paper presents two R packages ImbTreeEntropy and ImbTreeAUC to handle imbalanced data problems. ImbTreeEntropy functionality includes application of a generalized entropy functions, such as Rényi, Tsallis, Sharma–Mittal, Sharma–Taneja and Kapur, to measure impurity of a node. ImbTreeAUC provides non-standard measures to choose an optimal split point for an attribute (as well the optimal attribute for splitting) by employing local, semi-global and global AUC (Area Under the ROC curve) measures. Both packages are applicable for binary and multiclass problems and they support cost-sensitive learning, by defining a misclassification cost matrix, and weighted-sensitive learning. The packages accept all types of attributes, including continuous, ordered and nominal, where the latter type is simplified for multiclass problems to reduce the computational overheads. Both applications enable optimization of the thresholds where posterior probabilities determine final class labels in a way that misclassification costs are minimized. Model overfitting can be managed either during the growing phase or at the end using post-pruning. The packages are mainly implemented in R, however some computationally demanding functions are written in plain C++. In order to speed up learning time, parallel processing is supported as well.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Qi Zhenya ◽  
Zuoru Zhang

Abstract Background Heart disease is the primary cause of morbidity and mortality in the world. It includes numerous problems and symptoms. The diagnosis of heart disease is difficult because there are too many factors to analyze. What’s more, the misclassification cost could be very high. Methods A cost-sensitive ensemble method was proposed to improve the efficiency of diagnosis and reduce the misclassification cost. The proposed method contains five heterogeneous classifiers: random forest, logistic regression, support vector machine, extreme learning machine and k-nearest neighbor. T-test was used to investigate if the performance of the ensemble was better than individual classifiers and the contribution of Relief algorithm. Results The best performance was achieved by the proposed method according to ten-fold cross validation. The statistical tests demonstrated that the performance of the proposed ensemble was significantly superior to individual classifiers, and the efficiency of classification was distinctively improved by Relief algorithm. Conclusions The proposed ensemble gained significantly better results compared with individual classifiers and previous studies, which implies that it can be used as a promising alternative tool in medical decision making for heart disease diagnosis.


2021 ◽  
Author(s):  
Zhenya Qi ◽  
Zuoru Zhang

Abstract Background: Heart disease is the primary cause of morbidity and mortality in the world. It includes numerous problems and symptoms. The diagnosis of heart disease is difficult because there are too many factors to analyze. What's more, the misclassification cost could be very high. Methods: A cost-sensitive ensemble method was proposed to improve the efficiency of diagnosis and reduce the misclassification cost. The proposed method contains five heterogeneous classifiers: random forest, logistic regression, support vector machine, extreme learning machine and k-nearest neighbor. T-test was used to investigate if the performance of the ensemble was better than individual classifiers and the contribution of Relief algorithm. Results: The best performance was achieved by the proposed method according to ten-fold cross validation. The statistical tests demonstrated that the performance of the proposed ensemble was significantly superior to individual classifiers, and the efficiency of classification was distinctively improved by Relief algorithm. Conclusions: The proposed ensemble gained significantly better results compared with individual classifiers and previous studies, which implies that it can be used as a promising alternative tool in medical decision making for heart disease diagnosis.


2020 ◽  
Vol 0 (0) ◽  
pp. 1-24
Author(s):  
Yufei Xia ◽  
Lingyun He ◽  
Yinguo Li ◽  
Yating Fu ◽  
Yixin Xu

Credit scoring, which is typically transformed into a classification problem, is a powerful tool to manage credit risk since it forecasts the probability of default (PD) of a loan application. However, there is a growing trend of integrating survival analysis into credit scoring to provide a dynamic prediction on PD over time and a clear explanation on censoring. A novel dynamic credit scoring model (i.e., SurvXGBoost) is proposed based on survival gradient boosting decision tree (GBDT) approach. Our proposal, which combines survival analysis and GBDT approach, is expected to enhance predictability relative to statistical survival models. The proposed method is compared with several common benchmark models on a real-world consumer loan dataset. The results of out-of-sample and out-of-time validation indicate that SurvXGBoost outperform the benchmarks in terms of predictability and misclassification cost. The incorporation of macroeconomic variables can further enhance performance of survival models. The proposed SurvXGBoost meanwhile maintains some interpretability since it provides information on feature importance.


Sign in / Sign up

Export Citation Format

Share Document