scholarly journals Machine learning predictivity applied to consumer creditworthiness

2020 ◽  
Vol 6 (1) ◽  
Author(s):  
Maisa Cardoso Aniceto ◽  
Flavio Barboza ◽  
Herbert Kimura

AbstractCredit risk evaluation has a relevant role to financial institutions, since lending may result in real and immediate losses. In particular, default prediction is one of the most challenging activities for managing credit risk. This study analyzes the adequacy of borrower’s classification models using a Brazilian bank’s loan database, and exploring machine learning techniques. We develop Support Vector Machine, Decision Trees, Bagging, AdaBoost and Random Forest models, and compare their predictive accuracy with a benchmark based on a Logistic Regression model. Comparisons are analyzed based on usual classification performance metrics. Our results show that Random Forest and Adaboost perform better when compared to other models. Moreover, Support Vector Machine models show poor performance using both linear and nonlinear kernels. Our findings suggest that there are value creating opportunities for banks to improve default prediction models by exploring machine learning techniques.

RSC Advances ◽  
2014 ◽  
Vol 4 (106) ◽  
pp. 61624-61630 ◽  
Author(s):  
N. S. Hari Narayana Moorthy ◽  
Silvia A. Martins ◽  
Sergio F. Sousa ◽  
Maria J. Ramos ◽  
Pedro A. Fernandes

Classification models to predict the solvation free energies of organic molecules were developed using decision tree, random forest and support vector machine approaches and with MACCS fingerprints, MOE and PaDEL descriptors.


Analysis of credit scoring is an effective credit risk assessment technique, which is one of the major research fields in the banking sector. Machine learning has a variety of applications in the banking sector and it has been widely used for data analysis. Modern techniques such as machine learning have provided a self-regulating process to analyze the data using classification techniques. The classification method is a supervised learning process in which the computer learns from the input data provided and makes use of this information to classify the new dataset. This research paper presents a comparison of various machine learning techniques used to evaluate the credit risk. A credit transaction that needs to be accepted or rejected is trained and implemented on the dataset using different machine learning algorithms. The techniques are implemented on the German credit dataset taken from UCI repository which has 1000 instances and 21 attributes, depending on which the transactions are either accepted or rejected. This paper compares algorithms such as Support Vector Network, Neural Network, Logistic Regression, Naive Bayes, Random Forest, and Classification and Regression Trees (CART) algorithm and the results obtained show that Random Forest algorithm was able to predict credit risk with higher accuracy


2021 ◽  
Author(s):  
Roobaea Alroobaea ◽  
Seifeddine Mechti ◽  
Mariem Haoues ◽  
Saeed Rubaiee ◽  
Anas Ahmed ◽  
...  

Abstract Alzheimer's is the main reason for dementia, that affects frequently older adults. This disease is costly especially, in terms of treatment. In addition, Alzheimer's is one of the deaths causes in the old-age citizens. Early Alzheimer's detection helps medical staffs in this disease diagnosis, which will certainly decrease the risk of death. This made the early Alzheimer's disease detection a crucial problem in the healthcare industry. The objective of this research study is to introduce a computer-aided diagnosis system for Alzheimer's disease detection using machine learning techniques. We employed data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and the Open Access Series of Imaging Studies (OASIS) brain datasets. Common supervised machine learning techniques have been applied for automatic Alzheimer’s disease detection such as: logistic regression, support vector machine, random forest, linear discriminant analysis, etc. The best accuracy values provided by the machine learning classifiers are 99.43% and 99.10% given by respectively, logistic regression and support vector machine using ADNI dataset, whereas for the OASIS dataset, we obtained 84.33% and 83.92% given by respectively logistic regression and random forest.


Author(s):  
Abdul Manan koli ◽  
Muqeem Ahmed

Background: The process of election prediction started long back when common practice for election predictions were traditional methods like pundits, hereditary factor etc. However, in recent times new methods and techniques are being used for election forecasting like Data mining, Data Science, Big data, and numerous machine learning techniques. By using such computational techniques the whole process of political forecasting is changed and poll predictions are carried out through them. Method: The election prediction model is developed in Jupyter notebook web application using different supervised machine learning techniques. To obtain the optimal results, we perform the hyperparameter tuning of all the proposed classifiers. For measuring the performance of poll prediction system we used confusion matrix along with AUROC curve which depicts that this methods can be well suited for political forecasting. An important contribution of this article is to design a Prediction system which can be used for making prediction in other fields like cardiovascular disease predictions, weather forecasting etc. Results: This model is tested and trained with real-time dataset of the state Jammu and Kashmir (India). We applied features selection techniques like Random Forest, Decision Tree Classifier, Gradient boosting Classifier and Extra Gradient Boosting and obtained eight most important parameters like (Central Influence, Religion Followers, Party Wave, Party Abbreviations, Sensitive Areas, Vote Bank, Incumbent Party, and Caste Factor) for poll predictions with their mean weightages. By applying different classifier to get mean weightage of different parameters for this election prediction models, it has been observed that Party wave got maximum mean weightage of 0.82% as compared to others parameters. After obtaining the vital parameters for political forecasting, we applied various machine learning algorithms like Decision tree, Random forest, K-nearest neighbor and support vector machine for the early prediction of elections. Experimental results show that Support Vector Machine outperformed with a higher accuracy of 0.84% in contrast to others classifiers. Conclusion: In this paper, a clear overview of election prediction models, their potentials, techniques, parameters as well as limitations are outlined. We conclude this work by stating that election predictions can indeed be forecasted with significant parameters however, with caution due to the limitations which were outlined in developing nations like sensitive areas, social unrest, religion etc. This research work may be considered as the first attempt to use multiple classifier for forecasting the Assembly election results of the state Jammu and Kashmir (India).


A computerized system can improve the disease identifying abilities of doctor and also reduce the time needed for the identification and decision-making in healthcare. Gliomas are the brain tumors that can be labeled as Benign (non- cancerous) or Malignant (cancerous) tumor. Hence, the different stages of the tumor are extremely important for identification of appropriate medication. In this paper, a system has been proposed to detect brain tumor of different stages by MR images. The proposed system uses Fuzzy C-Mean (FCM) as a clustering technique for better outcome. The main focus in this paper is to refine the required features in two steps with the help of Discrete Wavelet Transform (DWT) and Independent Component Analysis (ICA) using three machine learning techniques i.e. Random Forest (RF), Artificial Neural Network (ANN) and Support Vector Machine (SVM). The final outcome of our experiment indicated that the proposed computerized system identifies the brain tumor using RF, ANN and SVM with 100%, 91.6% and 95.8%, accuracy respectively. We have also calculated Sensitivity, Specificity, Matthews’s Correlation Coefficient and AUC-ROC curve. Random forest shows the highest accuracy as compared to Support Vector Machine and Artificial Neural Networks.


2018 ◽  
Vol 77 (9) ◽  
pp. 2184-2189 ◽  
Author(s):  
Joshua Myrans ◽  
Zoran Kapelan ◽  
Richard Everson

Abstract This work presents a methodology for automatic detection of structural faults in sewers from CCTV footage, which has been improved by combining the outputs of different machine learning techniques. The predictions of support vector machine and random forest classifiers are combined using three distinct techniques: ‘both’, ‘most likely’ and ‘stacking’. Each technique is tested on CCTV data taken from real surveys covering a range of pipes at locations in the south-west of the UK. The best tested technique, stacking, offers a 5% increase in accuracy for minimal impact in efficiency, proving useful for future development and implementation of the fault detection methodology.


2021 ◽  
Vol 297 ◽  
pp. 01073
Author(s):  
Sabyasachi Pramanik ◽  
K. Martin Sagayam ◽  
Om Prakash Jena

Cancer has been described as a diverse illness with several distinct subtypes that may occur simultaneously. As a result, early detection and forecast of cancer types have graced essentially in cancer fact-finding methods since they may help to improve the clinical treatment of cancer survivors. The significance of categorizing cancer suffers into higher or lower-threat categories has prompted numerous fact-finding associates from the bioscience and genomics field to investigate the utilization of machine learning (ML) algorithms in cancer diagnosis and treatment. Because of this, these methods have been used with the goal of simulating the development and treatment of malignant diseases in humans. Furthermore, the capacity of machine learning techniques to identify important characteristics from complicated datasets demonstrates the significance of these technologies. These technologies include Bayesian networks and artificial neural networks, along with a number of other approaches. Decision Trees and Support Vector Machines which have already been extensively used in cancer research for the creation of predictive models, also lead to accurate decision making. The application of machine learning techniques may undoubtedly enhance our knowledge of cancer development; nevertheless, a sufficient degree of validation is required before these approaches can be considered for use in daily clinical practice. An overview of current machine learning approaches utilized in the simulation of cancer development is presented in this paper. All of the supervised machine learning approaches described here, along with a variety of input characteristics and data samples, are used to build the prediction models. In light of the increasing trend towards the use of machine learning methods in biomedical research, we offer the most current papers that have used these approaches to predict risk of cancer or patient outcomes in order to better understand cancer.


2020 ◽  
Vol 13 (1) ◽  
pp. 130-149
Author(s):  
Puneet Misra ◽  
Siddharth Chaurasia

Stock market movements are affected by numerous factors making it one of the most challenging problems for forecasting. This article attempts to predict the direction of movement of stock and stock indices. The study uses three classifiers - Artificial Neural Network, Random Forest and Support Vector Machine with four different representation of inputs. First representation uses raw data (open, high, low, close and volume), The second uses ten features in the form of technical indicators generated by use of technical analysis. The third and fourth portrayal presents two different ways of converting the indicator data into discrete trend data. Experimental results suggest that for raw data support vector machine provides the best results. For other representations, there is no clear winner regarding models applied, but portrayal of data by the proposed approach gave best overall results for all the models and financial series. Consistency of the results highlight the importance of feature generation and right representation of dataset to machine learning techniques.


The advancement in cyber-attack technologies have ushered in various new attacks which are difficult to detect using traditional intrusion detection systems (IDS).Existing IDS are trained to detect known patterns because of which newer attacks bypass the current IDS and go undetected. In this paper, a two level framework is proposed which can be used to detect unknown new attacks using machine learning techniques. In the first level the known types of classes for attacks are determined using supervised machine learning algorithms such as Support Vector Machine (SVM) and Neural networks (NN). The second level uses unsupervised machine learning algorithms such as K-means. The experimentation is carried out with four models with NSL- KDD dataset in Openstack cloud environment. The Model with Support Vector Machine for supervised machine learning, Gradual Feature Reduction (GFR) for feature selection and K-means for unsupervised algorithm provided the optimum efficiency of 94.56 %.


Sign in / Sign up

Export Citation Format

Share Document