scholarly journals Weight-Selected Attribute Bagging for Credit Scoring

2013 ◽  
Vol 2013 ◽  
pp. 1-13 ◽  
Author(s):  
Jianwu Li ◽  
Haizhou Wei ◽  
Wangli Hao

Assessment of credit risk is of great importance in financial risk management. In this paper, we propose an improved attribute bagging method, weight-selected attribute bagging (WSAB), to evaluate credit risk. Weights of attributes are first computed using attribute evaluation method such as linear support vector machine (LSVM) and principal component analysis (PCA). Subsets of attributes are then constructed according to weights of attributes. For each of attribute subsets, the larger the weights of the attributes the larger the probabilities by which they are selected into the attribute subset. Next, training samples and test samples are projected onto each attribute subset, respectively. A scoring model is then constructed based on each set of newly produced training samples. Finally, all scoring models are used to vote for test instances. An individual model that only uses selected attributes will be more accurate because of elimination of some of redundant and uninformative attributes. Besides, the way of selecting attributes by probability can also guarantee the diversity of scoring models. Experimental results based on two credit benchmark databases show that the proposed method, WSAB, is outstanding in both prediction accuracy and stability, as compared to analogous methods.


Sensors ◽  
2021 ◽  
Vol 21 (23) ◽  
pp. 8051
Author(s):  
Chunwang Dong ◽  
Chongshan Yang ◽  
Zhongyuan Liu ◽  
Rentian Zhang ◽  
Peng Yan ◽  
...  

Catechin is a major reactive substance involved in black tea fermentation. It has a determinant effect on the final quality and taste of made teas. In this study, we applied hyperspectral technology with the chemometrics method and used different pretreatment and variable filtering algorithms to reduce noise interference. After reduction of the spectral data dimensions by principal component analysis (PCA), an optimal prediction model for catechin content was constructed, followed by visual analysis of catechin content when fermenting leaves for different periods of time. The results showed that zero mean normalization (Z-score), multiplicative scatter correction (MSC), and standard normal variate (SNV) can effectively improve model accuracy; while the shuffled frog leaping algorithm (SFLA), the variable combination population analysis genetic algorithm (VCPA-GA), and variable combination population analysis iteratively retaining informative variables (VCPA-IRIV) can significantly reduce spectral data and enhance the calculation speed of the model. We found that nonlinear models performed better than linear ones. The prediction accuracy for the total amount of catechins and for epicatechin gallate (ECG) of the extreme learning machine (ELM), based on optimal variables, reached 0.989 and 0.994, respectively, and the prediction accuracy for EGC, C, EC, and EGCG of the content support vector regression (SVR) models reached 0.972, 0.993, 0.990, and 0.994, respectively. The optimal model offers accurate prediction, and visual analysis can determine the distribution of the catechin content when fermenting leaves for different fermentation periods. The findings provide significant reference material for intelligent digital assessment of black tea during processing.



2020 ◽  
Vol 16 (1) ◽  
pp. 155014772090363 ◽  
Author(s):  
Ying Liu ◽  
Lihua Huang

Recently, support vector machines, a supervised learning algorithm, have been widely used in the scope of credit risk management. However, noise may increase the complexity of the algorithm building and destroy the performance of classifier. In our work, we propose an ensemble support vector machine model to solve the risk assessment of supply chain finance, combined with reducing noises method. The main characteristics of this approach include that (1) a novel noise filtering scheme that avoids the noisy examples based on fuzzy clustering and principal component analysis algorithm is proposed to remove both attribute noise and class noise to achieve an optimal clean set, and (2) support vector machine classifiers, based on the improved particle swarm optimization algorithm, are seen as component classifiers. Then, we obtained the final classification results by combining finally individual prediction through AdaBoosting algorithm on the new sample set. Some experiments are applied on supply chain financial analysis of China’s listed companies. Results indicate that the credit assessment accuracy can be increased by applying this approach.



2011 ◽  
Vol 109 ◽  
pp. 636-640
Author(s):  
Bo Tang ◽  
Min Xia

With China's rapid economic development, credit scoring has become very important. This paper presents a new fuzzy support vector machine algorithm used to solve the problems of credit scoring. The empirical results show that the proposed fuzzy membership model is valid ,the algorithm has good prediction accuracy and anti-noise ability.



2011 ◽  
Vol 271-273 ◽  
pp. 1286-1290
Author(s):  
Yan Feng Guo ◽  
Na Sun ◽  
Yuan Yao

Credit risk problem is an essential problem in financial management area. People usually employ personal credit scoring to avoid financial risk problem. Although many methods have been proposed for evaluating the personal credit scoring and obtained good effects, most of these methods were called single model types, which would be disturbed by model self-parameter, data noise and other external factors. In order to overcome the weakness of single model, we believe one of best ways is to construct an ensemble model. In this paper, we proposed a new style of ensemble model and employed two public credit datasets to certify the validity of our ensemble model. The experimental result shows that the ensemble SOM-SVM model can overcome the single model weakness and improve the accuracy of classification, which is good for constructing a better credit scoring system in future.



2006 ◽  
Vol 18 (6) ◽  
pp. 744-750
Author(s):  
Ryouta Nakano ◽  
◽  
Kazuhiro Hotta ◽  
Haruhisa Takahashi

This paper presents an object detection method using independent local feature extractor. Since objects are composed of a combination of characteristic parts, a good object detector could be developed if local parts specialized for a detection target are derived automatically from training samples. To do this, we use Independent Component Analysis (ICA) which decomposes a signal into independent elementary signals. We then used the basis vectors derived by ICA as independent local feature extractors specialized for a detection target. These feature extractors are applied to a candidate area, and their outputs are used in classification. However, the number of dimension of extracted independent local features is very high. To reduce the extracted independent local features efficiently, we use Higher-order Local AutoCorrelation (HLAC) features to extract the information that relates neighboring features. This may be more effective for object detection than simple independent local features. To classify detection targets and non-targets, we use a Support Vector Machine (SVM). The proposed method is applied to a car detection problem. Superior performance is obtained by comparison with Principal Component Analysis (PCA).



Energies ◽  
2021 ◽  
Vol 14 (24) ◽  
pp. 8583
Author(s):  
Lei Wu ◽  
Zhenzhen Dong ◽  
Weirong Li ◽  
Cheng Jing ◽  
Bochao Qu

Well-logging is an important formation characterization and resource evaluation method in oil and gas exploration and development. However, there has been a shortage of well-logging data because Well-logging can only be measured by expensive and time-consuming field tests. In this study, we aimed to find effective machine learning techniques for well-logging data prediction, considering the temporal and spatial characteristics of well-logging data. To achieve this goal, the convolutional neural network (CNN) and the long short-term memory (LSTM) neural networks were combined to extract the spatial and temporal features of well-logging data, and the particle swarm optimization (PSO) algorithm was used to determine hyperparameters of the optimal CNN-LSTM architecture to predict logging curves in this study. We applied the proposed CNN-LSTM-PSO model, along with support vector regression, gradient-boosting regression, CNN-PSO, and LSTM-PSO models, to forecast photoelectric effect (PE) logs from other logs of the target well, and from logs of adjacent wells. Among the applied algorithms, the proposed CNN-LSTM-PSO model generated the best prediction of PE logs because it fully considers the spatio-temporal information of other well-logging curves. The prediction accuracy of the PE log using logs of the adjacent wells was not as good as that using the other well-logging data of the target well itself, due to geological uncertainties between the target well and adjacent wells. The results also show that the prediction accuracy of the models can be significantly improved with the PSO algorithm. The proposed CNN-LSTM-PSO model was found to enable reliable and efficient Well-logging prediction for existing and new drilled wells; further, as the reservoir complexity increases, the proxy model should be able to reduce the optimization time dramatically.



2013 ◽  
Vol 734-737 ◽  
pp. 2978-2982 ◽  
Author(s):  
Xin Lei Zhang ◽  
Meng Gang Li ◽  
Zuo Quan Zhang

According to the basic theories of Logit regression analysis and support vector machine, this article involves improved multi-classification combination algorithm. When applying this model, there are some innovations. First, choose optimized composite indicator as a variable through principal component analysis and get more information. Second, introduce Logit parameter model to the quadratic to increase prediction accuracy. Third, put forward a multi-classification combination model of improved Logit model with SVM to increase prediction accuracy.



2009 ◽  
Vol 23 (06n07) ◽  
pp. 1099-1104 ◽  
Author(s):  
XUEXIA XU ◽  
BINGZHE BAI ◽  
WEI YOU

The principal component analysis-artificial neural network (PCA-ANN) model was developed to predict martensite transformation start temperature ( Ms ) of steels. Training samples were processed by principal component analysis and the number of input variables was reduced from 6 to 4, then the scores of principal components were used to establish new sample database to train the ANN model. Ms of steels were predicted by the PCA-ANN model. The predicted and measured Ms distribute along the 0-45° diagonal in the scatter diagram and the statistical errors are MSE-16.0256, MSRE-4.49% and VOF-1.97790 respectively. Comparing the prediction results of different models it is shown that the accuracy of the PCA-ANN model was the highest, which indicated that the principal component analysis was helpful to improve the prediction accuracy of ANN model.



2011 ◽  
Vol 354-355 ◽  
pp. 216-221 ◽  
Author(s):  
Jian Guo Yang ◽  
Xiao Long Zhang ◽  
Hong Zhao

It is significant for safe operation and energy saving to foreknow ash fusibility of coal. Ash fusibility of coal was divided into three levels according to softening temperature. The fusibility level was correlated with coal properties by a nonlinear classified model which was built using support vector machine. The model receives coal properties as input variables and would give a judgment of fusibility level as an output. Validation of the nonlinear classified model on 62 training samples yielded 100% accuracy. The prediction accuracy of 15 testing samples was 86.7%. Results indicate that the level of ash fusibility can be accurately predicted from coal properties with the nonlinear classified model.



2012 ◽  
Vol 461 ◽  
pp. 753-756
Author(s):  
Chong Xing ◽  
Yao Wang ◽  
You Zhou ◽  
Yan Chun Liang

Recently, non-coding RNA prediction is the one of the most important researches in bioinformatics. In this paper, on the basis of principal component analysis, we present a tRNA prediction strategy by using least squares support vector machine (LS-SVM). Appearance frequencies of single nucleotide, 2 – nucleotides and (G-C) %, (A-T) % were chosen as characteristics inputs. Results from tests showed that the prediction accuracy was 90.51% on prokaryotic tRNA dataset. Experimental results indicate that the method is effective for prokaryotic ncRNA prediction.



Sign in / Sign up

Export Citation Format

Share Document