scholarly journals Tax Default Prediction using Feature Transformation-Based Machine Learning

IEEE Access ◽  
2020 ◽  
pp. 1-1
Author(s):  
Mohammad Zoynul Abedin ◽  
Guotai Chi ◽  
Mohammed Mohi Uddin ◽  
Md. Shahriare Satu ◽  
Md. Imran Khan ◽  
...  
Author(s):  
Maroua Bahri ◽  
Albert Bifet ◽  
Silviu Maniu ◽  
Heitor Murilo Gomes

Mining high-dimensional data streams poses a fundamental challenge to machine learning as the presence of high numbers of attributes can remarkably degrade any mining task's performance. In the past several years, dimension reduction (DR) approaches have been successfully applied for different purposes (e.g., visualization). Due to their high-computational costs and numerous passes over large data, these approaches pose a hindrance when processing infinite data streams that are potentially high-dimensional. The latter increases the resource-usage of algorithms that could suffer from the curse of dimensionality. To cope with these issues, some techniques for incremental DR have been proposed. In this paper, we provide a survey on reduction approaches designed to handle data streams and highlight the key benefits of using these approaches for stream mining algorithms.


2021 ◽  
Vol 16 (2) ◽  
pp. 35-49
Author(s):  
Adamaria Perrotta ◽  
◽  
Georgios Bliatsios ◽  

Peer-to-Peer (P2P) lending is an online lending process allowing individuals to obtain or concede loans without the interference of traditional financial intermediaries. It has grown quickly the last years, with some platforms reaching billions of dollars of loans in principal in a short amount of time. Since each loan is associated with the probability of loss due to a borrower's failure, this paper addresses the borrower's default prediction problem in the P2P financial ecosystem. The main assumption, which makes this study different from the available literature, is that borrowers sharing the same homeownership status display similar risk profile, thus a model per segment should be developed. We estimate the Probability of Default (PD) of a borrower by using Logistic Regression (LR) coupled with Weight of Evidence encoding. The features set is identified via the Sequential Feature Selection (SFS). We compare the forward against the backward SFS, in terms of the Area Under the Curve (AUC), and we choose the one that maximizes this statistic. Finally, we compare the results of the chosen LR approach against two other popular Machine Learning (ML) techniques: the k Nearest Neighbors (k-NN) and the Random Forest (RF).


2020 ◽  
Vol 12 (16) ◽  
pp. 6325 ◽  
Author(s):  
Hyeongjun Kim ◽  
Hoon Cho ◽  
Doojin Ryu

Corporate default predictions play an essential role in each sector of the economy, as highlighted by the global financial crisis and the increase in credit risk. This study reviews the corporate default prediction literature from the perspectives of financial engineering and machine learning. We define three generations of statistical models: discriminant analyses, binary response models, and hazard models. In addition, we introduce three representative machine learning methodologies: support vector machines, decision trees, and artificial neural network algorithms. For both the statistical models and machine learning methodologies, we identify the key studies used in corporate default prediction. By comparing these methods with findings from the interdisciplinary literature, our review suggests some new tasks in the field of machine learning for predicting corporate defaults. First, a corporate default prediction model should be a multi-period model in which future outcomes are affected by past decisions. Second, the stock price and the corporate value determined by the stock market are important factors to use in default predictions. Finally, a corporate default prediction model should be able to suggest the cause of default.


Author(s):  
Upasana Mukherjee ◽  
Vandana Thakkar ◽  
Shawni Dutta ◽  
Utsab Mukherjee ◽  
Samir Kumar Bandyopadhyay

The growth of regularly generated data from many financial activities has significant implications for every corner of financial modelling. This study has investigated the utilization of these continuous growing data by a means of an automated process. The automated process can be developed by using Machine learning based techniques that analyze the data and gain experience from the underlying data. Different important domains of financial fields such as Credit card fraud detection, bankruptcy detection, loan default prediction, investment prediction, marketing and many more can be modelled by implementing machine learning methods. Among several machine learning based techniques, the use of parametric and non-parametric based methods are approached by this research. Two parametric models namely Logistic Regression, Gaussian Naive Bayes models and two non-parametric methods such as Random Forest, Decision Tree are implemented in this paper. All the mentioned models are developed and implemented in the field of Credit card fraud detection, bankruptcy detection, loan default prediction. In each of the aforementioned cases, the comparative study among the classification techniques is drawn and the best model is identified. The performance of each classifier on each considered domain is evaluated by various performance metrics such as accuracy, F1-score and mean squared error. In the credit card fraud detection model the decision tree classifier performs the best with an accuracy of 99.1% and, in the loan default prediction and bankruptcy detection model, the random forest classifier gives the best accuracy of  97% and 96.84% respectively.


Sign in / Sign up

Export Citation Format

Share Document