scholarly journals Analysis of Distinct Feature Groups in the Credit Scoring Problem

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Luiz F. V. Vercosa ◽  
Rodrigo C. Lira ◽  
Rodrigo P. Monteiro ◽  
Kleber D. M. Silva ◽  
Jailson O. L. Magalhaes ◽  
...  

Registration and financial data have been traditionally used for the credit scoring problem. However,slight improvements in the reliability of the scores positively impacts financial companies. Therefore, exploring newfeatures is a strategic task. This work analyzes the importance of new feature groups not commonly employed forthe credit scoring task and others already used. We categorized features from open credit scoring datasets, suchas German and Australian and compared their groups with the ones of a company dataset used in this work. Ourdataset contains unusual feature groups, such as historical, geolocation, web behavior, and demographic data. In ouranalyzes, we first conducted bivariate tests with each feature-pair to assess their individual importance. Secondly, weran XGBoost machine learning model with each feature group to evaluate each group importance. We also appliedfeature selection with binary Particle Swarm Optimization to assess the groups importance when combined. Next, weemployed correlation tests to find inner and inter-correlation among the features groups. Finally, we used the companydataset and employed AdaBoost, Multilayer Perceptron, and XGBoost algorithms to find the best model for the task.Some of our main findings were that the unusual features added a slight improvement to registration features. We alsodetected reasonable inner correlation among some feature groups and found that all groups were relevant for the taskwith the Historical Group as the most promising. Lastly, XGBoost obtained the best performance over AdaBoost andMultilayer-perceptron for the task.

2021 ◽  
Vol 40 (5) ◽  
pp. 9471-9484
Author(s):  
Yilun Jin ◽  
Yanan Liu ◽  
Wenyu Zhang ◽  
Shuai Zhang ◽  
Yu Lou

With the advancement of machine learning, credit scoring can be performed better. As one of the widely recognized machine learning methods, ensemble learning has demonstrated significant improvements in the predictive accuracy over individual machine learning models for credit scoring. This study proposes a novel multi-stage ensemble model with multiple K-means-based selective undersampling for credit scoring. First, a new multiple K-means-based undersampling method is proposed to deal with the imbalanced data. Then, a new selective sampling mechanism is proposed to select the better-performing base classifiers adaptively. Finally, a new feature-enhanced stacking method is proposed to construct an effective ensemble model by composing the shortlisted base classifiers. In the experiments, four datasets with four evaluation indicators are used to evaluate the performance of the proposed model, and the experimental results prove the superiority of the proposed model over other benchmark models.


2020 ◽  
Author(s):  
Luiz Felipe Vercosa ◽  
Rodrigo Lira ◽  
Rodrigo Monteiro ◽  
Kleber Silva ◽  
Jailson Magalhaes ◽  
...  

Standard features used for Credit Scoring includes mainly registration and financial data from customers. However, exploring new features is of great interest for financial companies, since slight improvements in the person score directly impact the company revenue. In this work, we categorize features from open credit scoring datasets and compare them with the features found in a real company dataset. The company dataset contains unusual feature groups such as historical, geolocation, web behavior, and demographic data. We performed bivariate tests using the Kolmogorov-Smirnov metric and features to assess the performance of the particular feature groups. We also generated a score of good payer by using AdaBoost, Multilayer Perceptron, and XGBoost algorithms. Then, we analyzed the results with different metrics and compared them with the real company results. Our main finding was that these features added a small improvement to current datasets. We also identified the most promising feature groups and noticed that the tuned XGBoost performed better than the company solution in three out of four deployed metrics.


Author(s):  
Tiffany Jiang

An unprecedented amount of access to data, “big data (or high dimensional data),” cloud computing, and innovative technology have increased applications of artificial intelligence in finance and numerous other industries. Machine learning is used in process automation, security, underwriting and credit scoring, algorithmic trading and robo-advisory. In fact, machine learning AI applications are purported to save banks an estimated $447 billion by 2023. Given the advantages that AI brings to finance, we focused on applying supervised machine learning to an investment problem. 10-K SEC filings are routinely used by investors to determine the worth and status of a company–Warren Buffett is frequently cited to read a 10-K a day. We sought to answer–“Can machine learning analyze more than thousands of companies and spot patterns? Can machine learning automate the process of human analysis in predicting whether a company is fit to merge? Can machine learning spot something that humans cannot?” In the advent of rising antitrust discussion of growing market concentrations and the concern for decrease in competition, we analyzed merger activity using text as a data set. Merger activity has been traditionally hard to predict in the past. We took advantage of the large amount of publicly available filings through the Securities Exchange Commission that give a comprehensive summary of a company, and used text, and an innovative way to analyze a company. In order to verify existing theory and measure harder to observe variables, we look to use a text document and examined a firm’s 10-K SEC filing. To minimize over-fitting, the L2 LASSO regularization technique is used. We came up with a model that has 85% accuracy compared to a 35% accuracy using the “bag-of-words” method to predict a company’s likelihood of merging from words alone on the same period’s test data set. These steps are the beginnings of tackling more complicated questions, such as “Which section or topic of words is the most predictive?” and “What is the difference between being acquired and acquiring?” Using product descriptions to characterize mergers further into horizontal and vertical mergers could eventually assist with the causal estimates that are of interest to economists. More importantly, using language and words to categorize companies could be useful in predicting counterfactual scenarios and answering policy questions, and could have different applications ranging from detecting fraud to better trading.


Author(s):  
Reza Firsandaya Malik ◽  
Hermawan Hermawan

<span>Credit scoring is a procedure that exists in every financial institution. A way to predict whether the debtor was qualified to be given the loan or not and has been a major concern in the overall steps of the loan process. Almost all banks and other financial institutions have their own credit scoring methods. Nowadays, data mining approach has been accepted to be one of the well-known methods. Certainly, accuracy was also a major issue in this approach. This research proposed a hybrid method using CART algorithm and Binary Particle Swarm Optimization. Performance indicators that are used in this research are classification accuracy, error rate, sensitivity, specificity, and precision. Experimental results based on the public dataset showed that the proposed method accuracy is 78 %. In compare to several popular algorithms, such as neural network, logistic regression and support vector machine, the proposed method showed an outstanding performance. </span>


Author(s):  
A. Oliart Ros ◽  
T. González Cacho ◽  
D. Sol Martínez ◽  
D. Clavijo Plourde

Abstract. This work aims to create a methodology to automatize the classification of public spaces using a perception test and data obtained from city census information, in our case, from the Mexican National Institute of Statistics and Geography (INEGI). Nowadays there is no well defined process in decision making when planning the creation or development of public spaces. For this reason, a study to measure the human perception was made in order to gather data about what people perceived about five variables: architectural beauty, pollution, fun, wealth and safety. The information obtained was used to create a Machine learning model that could find a relation between the perception obtained and the census dataset. This first attempt aims to find key insights needed to develop a more complex methodology to classify, at a greater scale, public places in terms of their safety or architectural value and which socio-demographic data defines this perception.


Author(s):  
Andrej Kovačič ◽  
Andrej Raspor ◽  
Janez Kolar ◽  
Janez Žezlina

The main research question is: How do Slovenian employers assess the level of absenteeism in their companies and what measures do they take to control it? We collected the data for research in 155 Slovenian companies in 2019 by the use of a questionnaire (close- and open-ended questions) that was answered by the people responsible for staffing (human resource managers) or managers in small companies. In addition to questions for demographic data (region, activity, organisation’s size, ownership), we also included variables with the following descriptive answers: The range of absenteeism in a company; The reasons for absenteeism in a company; Absenteeism management in a company; Actions for absenteeism management in a company. Absenteeism is not perceived as critical, but it is more often present among production workers. Various diseases are still the most common reason for absenteeism. The employees are mainly committed to work and do not take advantage of sick leave. The employers should establish systems for managing absenteeism within the companies. In this respect, the system should also consider how committed to work the worker is, how efficient is in using the elements of work and how qualitative is his output. On the basis of these findings, it would be advisable to prepare a research instrument with which we could identify what is happening in the field of absenteeism and which actions should be taken at the national, regional level or at the companies’ level, especially in the period of the Covid-19 pandemic.


2021 ◽  
Vol 14 (11) ◽  
pp. 565
Author(s):  
Joseph L. Breeden ◽  
Eugenia Leonova

Unintended bias against protected groups has become a key obstacle to the widespread adoption of machine learning methods. This work presents a modeling procedure that carefully builds models around protected class information in order to make sure that the final machine learning model is independent of protected class status, even in a nonlinear sense. This procedure works for any machine learning method. The procedure was tested on subprime credit card data combined with demographic data by zip code from the US Census. The census data serves as an imperfect proxy for borrower demographics but serves to illustrate the procedure.


2020 ◽  
Vol 13 (8) ◽  
pp. 180 ◽  
Author(s):  
Bernard Dushimimana ◽  
Yvonne Wambui ◽  
Timothy Lubega ◽  
Patrick E. McSharry

Airtime lending default rates are typically lower than those experienced by banks and microfinance institutions (MFIs) but are likely to grow as the service is offered more widely. In this paper, credit scoring techniques are reviewed, and that knowledge is built upon to create an appropriate machine learning model for airtime lending. Over three million loans belonging to more than 41 thousand customers with a repayment period of three months are analysed. Logistic Regression, Decision Trees and Random Forest are evaluated for their ability to classify defaulters using several cross-validation approaches and the latter model performed best. When the default rate is below 2%, it is better to offer everyone a loan. For higher default rates, the model substantially enhances profitability. The model quadruples the tolerable level of default rate for breaking even from 8% to 32%. Nonlinear classification models offer considerable potential for credit scoring, coping with higher levels of default and therefore allowing for larger volumes of customers.


2018 ◽  
Vol 28 (1) ◽  
pp. 209-216
Author(s):  
Snezana Ristevska – Jovanovska ◽  
Marija Magdincheva – Shopova

The marketing practice of creating a name, symbol or design that identifies and differentiates a product from other products. Your marketing and branding clearly influence that perception but your brand exists whether you actively market your business or not. If you’re out there and people are interacting with your business, you have a brand. Brand is a known identity of a company in terms of what products and services they offer but also the essence of what the company stands for in terms of service and other emotional, non tangible consumer concerns. To brand something is when a company or person makes descriptive and evocative communications, subtle and overt statements that describe what the company stands for. Relationship that brands have with people basically changed the process of digitization of the media. Influence marketers use the mobile phone in the marketing communication process. In this regard, implementing a successful marketing campaign is critical acceptance of the mobile device by consumers as a new way of thinking. For marketers fail to increase the engagement of users, to add value and ultimately increase their return on investment in marketing, it is essential that they understand the attitudes and intentions of customers to mobile marketing. For the purpose of this paper was conducted quantitative, descriptive research. The purpose of this research is by analyzing the habits of using mobile devices to determine the attitudes of users of smartphones for mobile marketing in the country. The survey was conducted using the on line questionnaire, made and distributed only to those users who use smartphones in the period June-October 2018. The survey was conducted by the method of testing undisguised structured questionnaire on 260 respondents. The questionnaire consisted of ten questions and most of the questions are structured closed. In the initial part of the survey focus was on basic demographic data (sex and age). The next questions are related to activities that most respondents use the smartphone as well as preferences for activities for which users often use smart phones.


Sign in / Sign up

Export Citation Format

Share Document