scholarly journals The Use of Genetic Algorithm, Clustering and Feature Selection Techniques in Construction of Decision Tree Models for Credit Scoring

2013 ◽  
Vol 5 (4) ◽  
pp. 13-32 ◽  
Author(s):  
Mohammad Khanbabaei ◽  
Mahmood Alborzi

The analization of cancer data and normal data for the predication of somatic mu-tation occurrences in the data set plays an important role and several challenges persist in detectingsomatic mutations which leads to complexity of handling large volumes of data in classifi-cation with good accuracy. In many situations the dataset may consist of redundant and less significant features and there is a need to remove insignificant features in order to improve the performance of classification. Feature selection techniques are useful for dimensionality reduction purpose. PCA is one type of feature selection technique to identify significant attributes and is adopted in this paper. A novel technique, PCA based regression decision tree is proposed for classification of somatic mutations data in this paper.The performance analysis of this clas-sification process for the detection of somatic mutation is compared with existing algorithms and satisfactory results are obtained with the proposed model.


2021 ◽  
pp. 2796-2812
Author(s):  
Nishath Ansari

     Feature selection, a method of dimensionality reduction, is nothing but collecting a range of appropriate feature subsets from the total number of features. In this paper, a point by point explanation review about the feature selection in this segment preferred affairs and its appraisal techniques are discussed. I will initiate my conversation with a straightforward approach so that we consider taking care of features and preferred issues depending upon meta-heuristic strategy. These techniques help in obtaining the best highlight subsets. Thereafter, this paper discusses some system models that drive naturally from the environment are discussed and calculations are performed so that we can take care of the preferred feature matters in complex and massive data. Here, furthermore, I discuss algorithms like the genetic algorithm (GA), the Non-Dominated Sorting Genetic Algorithm (NSGA-II), Particle Swarm Optimization (PSO), and some other meta-heuristic strategies for considering the provisional separation of issues. A comparison of these algorithms has been performed; the results show that the feature selection technique benefits machine learning algorithms by improving the performance of the algorithm. This paper also presents various real-world applications of using feature selection.


Author(s):  
Rahul Hans ◽  
Harjot Kaur

It can be acknowledged from the literature that the high density of breast tissue is a root cause for the escalation of breast cancer among the women, imparting its prime role in Cancer Death among women. Moreover, in this era where computer-aided diagnosis systems have become the right hand of the radiologists, the researchers still find room for improvement in the feature selection techniques. This research aspires to propose hybrid versions of Biogeography-Based Optimization and Genetic Algorithm for feature selection in Breast Density Classification, to get rid of redundant and irrelevant features from the dataset; along with it to achieve the superior classification accuracy or to uphold the same accuracy with lesser number of features. For experimentation, 322 mammogram images from mini-MIAS database are chosen, and then Region of Interests (ROI) of seven different sizes are extracted to extract a set of 45 texture features corresponding to each ROI. Subsequently, the proposed algorithms are used to extract an optimal subset of features from the hefty set of features corresponding to each ROI. The results indicate the outperformance of the proposed algorithms when results were compared with some of the other nature-inspired metaheuristic algorithms using various parameters.


2008 ◽  
Vol 12 (3) ◽  
Author(s):  
Jozef Zurada ◽  
Peng C. Lam

For many years lenders have been using traditional statistical techniques such as logistic regression and discriminant analysis to more precisely distinguish between creditworthy customers who are granted loans and non-creditworthy customers who are denied loans. More recently new machine learning techniques such as neural networks, decision trees, and support vector machines have been successfully employed to classify loan applicants into those who are likely to pay a loan off or default upon a loan. Accurate classification is beneficial to lenders in terms of increased financial profits or reduced losses and to loan applicants who can avoid overcommitment. This paper examines a historical data set from consumer loans issued by a German bank to individuals whom the bank considered to be qualified customers. The data set consists of the financial attributes of each customer and includes a mixture of loans that the customers paid off or defaulted upon. The paper examines and compares the classification accuracy rates of three decision tree techniques as well as analyzes their ability to generate easy to understand rules.


2021 ◽  
pp. 1-15
Author(s):  
Jianrong Yao ◽  
Zhongyi Wang ◽  
Lu Wang ◽  
Zhebin Zhang ◽  
Hui Jiang ◽  
...  

With the in-depth application of artificial intelligence technology in the financial field, credit scoring models constructed by machine learning algorithms have become mainstream. However, the high-dimensional and complex attribute features of the borrower pose challenges to the predictive competence of the model. This paper proposes a hybrid model with a novel feature selection method and an enhanced voting method for credit scoring. First, a novel feature selection combined method based on a genetic algorithm (FSCM-GA) is proposed, in which different classifiers are used to select features in combination with a genetic algorithm and combine them to generate an optimal feature subset. Furthermore, an enhanced voting method (EVM) is proposed to integrate classifiers, with the aim of improving the classification results in which the prediction probability values are close to the threshold. Finally, the predictive competence of the proposed model was validated on three public datasets and five evaluation metrics (accuracy, AUC, F-score, Log loss and Brier score). The comparative experiment and significance test results confirmed the good performance and robustness of the proposed model.


2013 ◽  
Vol 2013 ◽  
pp. 1-8 ◽  
Author(s):  
Chun-Wei Tung ◽  
Ming-Tsang Wu ◽  
Yu-Kuei Chen ◽  
Chun-Chieh Wu ◽  
Wei-Chung Chen ◽  
...  

Esophageal squamous cell cancer (ESCC) is one of the most common fatal human cancers. The identification of biomarkers for early detection could be a promising strategy to decrease mortality. Previous studies utilized microarray techniques to identify more than one hundred genes; however, it is desirable to identify a small set of biomarkers for clinical use. This study proposes a sequential forward feature selection algorithm to design decision tree models for discriminating ESCC from normal tissues. Two potential biomarkers of RUVBL1 and CNIH were identified and validated based on two public available microarray datasets. To test the discrimination ability of the two biomarkers, 17 pairs of expression profiles of ESCC and normal tissues from Taiwanese male patients were measured by using microarray techniques. The classification accuracies of the two biomarkers in all three datasets were higher than 90%. Interpretable decision tree models were constructed to analyze expression patterns of the two biomarkers. RUVBL1 was consistently overexpressed in all three datasets, although we found inconsistent CNIH expression possibly affected by the diverse major risk factors for ESCC across different areas.


Sign in / Sign up

Export Citation Format

Share Document