feature discretization
Recently Published Documents


TOTAL DOCUMENTS

22
(FIVE YEARS 4)

H-INDEX

5
(FIVE YEARS 0)

Electronics ◽  
2021 ◽  
Vol 10 (17) ◽  
pp. 2099
Author(s):  
Paweł Ziemba ◽  
Jarosław Becker ◽  
Aneta Becker ◽  
Aleksandra Radomska-Zalas ◽  
Mateusz Pawluk ◽  
...  

One of the important research problems in the context of financial institutions is the assessment of credit risk and the decision to whether grant or refuse a loan. Recently, machine learning based methods are increasingly employed to solve such problems. However, the selection of appropriate feature selection technique, sampling mechanism, and/or classifiers for credit decision support is very challenging, and can affect the quality of the loan recommendations. To address this challenging task, this article examines the effectiveness of various data science techniques in issue of credit decision support. In particular, processing pipeline was designed, which consists of methods for data resampling, feature discretization, feature selection, and binary classification. We suggest building appropriate decision models leveraging pertinent methods for binary classification, feature selection, as well as data resampling and feature discretization. The selected models’ feasibility analysis was performed through rigorous experiments on real data describing the client’s ability for loan repayment. During experiments, we analyzed the impact of feature selection on the results of binary classification, and the impact of data resampling with feature discretization on the results of feature selection and binary classification. After experimental evaluation, we found that correlation-based feature selection technique and random forest classifier yield the superior performance in solving underlying problem.



Author(s):  
Qiong Chen ◽  
Mengxing Huang

AbstractFeature discretization is an important preprocessing technology for massive data in industrial control. It improves the efficiency of edge-cloud computing by transforming continuous features into discrete ones, so as to meet the requirements of high-quality cloud services. Compared with other discretization methods, the discretization based on rough set has achieved good results in many applications because it can make full use of the known knowledge base without any prior information. However, the equivalence class of rough set is an ordinary set, which is difficult to describe the fuzzy components in the data, and the accuracy is low in some complex data types in big data environment. Therefore, we propose a rough fuzzy model based discretization algorithm (RFMD). Firstly, we use fuzzy c-means clustering to get the membership of each sample to each category. Then, we fuzzify the equivalence class of rough set by the obtained membership, and establish the fitness function of genetic algorithm based on rough fuzzy model to select the optimal discrete breakpoints on the continuous features. Finally, we compare the proposed method with the discretization algorithm based on rough set, the discretization algorithm based on information entropy, and the discretization algorithm based on chi-square test on remote sensing datasets. The experimental results verify the effectiveness of our method.



2014 ◽  
Vol 123 ◽  
pp. 60-74 ◽  
Author(s):  
Artur J. Ferreira ◽  
Mário A.T. Figueiredo


Sign in / Sign up

Export Citation Format

Share Document