Analysis of Loan Availability using Machine Learning Techniques

In the banking system, banks have a variety of products to provide, but credit lines are their primary source of revenue. As a result, they will profit from the interest earned on the loans they make. Loans, or whether customers repay or default on their loans, affect a bank's profit or loss. The bank's Non-Performing Assets will be reduced by forecasting loan defaulters. As a result, further investigation into this occurrence is essential. Because precise forecasts are essential for benefit maximisation, it's crucial to analyse and compare the various methodologies. The logistic regression model is an important predictive analytics tool for detecting loan defaulters. In order to assess and forecast, data from Kaggle is acquired. Logistic Regression models were used to calculate the various performance indicators. The models are compared using performance metrics like sensitivity and specificity. In addition to checking account details (which indicate a customer's wealth), the model is significantly better because it includes variables (customer personal attributes such as age, objective, credit score, credit amount, credit period, and so on) that should be considered when correctly calculating the probability of loan default. As a result, using a logistic regression approach, the appropriate clients to target for loan issuance can be easily identified by evaluating their plausibility of loan default. The model implies that a bank should assess a creditor's other attributes, which play a critical role in credit decisions and forecasting loan defaulters, in addition to giving loans to wealthy borrowers.

Download Full-text

Predictive analytics for loan default in banking sector using machine learning techniques

2018 28th International Conference on Computer Theory and Applications (ICCTA) ◽

10.1109/iccta45985.2018.9499147 ◽

2018 ◽

Author(s):

Salma Khaled Shaheen ◽

Essam ElFakharany

Keyword(s):

Machine Learning ◽

Banking Sector ◽

Predictive Analytics ◽

Machine Learning Techniques ◽

Loan Default ◽

Learning Techniques

Download Full-text

Machine learning techniques to derive bioclimatic classifications for Colombia

10.1101/2021.09.05.459033 ◽

2021 ◽

Author(s):

Richard Rios ◽

Elkin A. Noguera-Urbano ◽

Jairo Espinosa ◽

Jose Manuael Ochoa

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Logistic Regression ◽

Regression Models ◽

Principal Component ◽

Component Analysis ◽

Machine Learning Techniques ◽

Study Region ◽

Logistic Regression Models ◽

Learning Techniques

Bioclimatic classifications seek to divide a study region into geographic areas with similar bioclimatic characteristics. In this study we proposed two bioclimatic classifications for Colombia using machine learning techniques. We firstly characterized the precipitation space of Colombia using principal component analysis. Based on Lang classification, we then projected all background sites in the precipitation space with their corresponding categories. We sequentially fit logistic regression models to re-classify all background sites in the precipitation space with six redefined Lang categories. New categories were the used to define a new modified Lang and Caldas-Lang classifications.

Download Full-text

Cardiac Disease Prediction using Supervised Machine Learning Techniques.

Journal of Physics Conference Series ◽

10.1088/1742-6596/2161/1/012013 ◽

2022 ◽

Vol 2161 (1) ◽

pp. 012013

Author(s):

Chiradeep Gupta ◽

Athina Saha ◽

N V Subba Reddy ◽

U Dinesh Acharya

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Cardiac Disease ◽

Performance Metrics ◽

Confusion Matrix ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

Ensemble Techniques ◽

Learning Techniques

Abstract Diagnosis of cardiac disease requires being more accurate, precise, and reliable. The number of death cases due to cardiac attacks is increasing exponentially day by day. Thus, practical approaches for earlier diagnosis of cardiac or heart disease are done to achieve prompt management of the disease. Various supervised machine learning techniques like K-Nearest Neighbour, Decision Tree, Logistic Regression, Naïve Bayes, and Support Vector Machine (SVM) model are used for predicting cardiac disease using a dataset that was collected from the repository of the University of California, Irvine (UCI). The results depict that Logistic Regression was better than all other supervised classifiers in terms of the performance metrics. The model is also less risky since the number of false negatives is low as compared to other models as per the confusion matrix of all the models. In addition, ensemble techniques can be approached for the accuracy improvement of the classifier. Jupyter notebook is the best tool, for the implementation of Python Programming having many types of libraries, header files, for accurate and precise work.

Download Full-text

A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on Twitter

Future Internet ◽

10.3390/fi12110187 ◽

2020 ◽

Vol 12 (11) ◽

pp. 187 ◽

Cited By ~ 1

Author(s):

Amgad Muneer ◽

Suliman Mohamed Fati

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Performance Metrics ◽

Machine Learning Techniques ◽

Stochastic Gradient Descent ◽

Gradient Boosting ◽

Support Vector ◽

Light Gradient ◽

Global Issue ◽

Cyberbullying Detection

The advent of social media, particularly Twitter, raises many issues due to a misunderstanding regarding the concept of freedom of speech. One of these issues is cyberbullying, which is a critical global issue that affects both individual victims and societies. Many attempts have been introduced in the literature to intervene in, prevent, or mitigate cyberbullying; however, because these attempts rely on the victims’ interactions, they are not practical. Therefore, detection of cyberbullying without the involvement of the victims is necessary. In this study, we attempted to explore this issue by compiling a global dataset of 37,373 unique tweets from Twitter. Moreover, seven machine learning classifiers were used, namely, Logistic Regression (LR), Light Gradient Boosting Machine (LGBM), Stochastic Gradient Descent (SGD), Random Forest (RF), AdaBoost (ADB), Naive Bayes (NB), and Support Vector Machine (SVM). Each of these algorithms was evaluated using accuracy, precision, recall, and F1 score as the performance metrics to determine the classifiers’ recognition rates applied to the global dataset. The experimental results show the superiority of LR, which achieved a median accuracy of around 90.57%. Among the classifiers, logistic regression achieved the best F1 score (0.928), SGD achieved the best precision (0.968), and SVM achieved the best recall (1.00).

Download Full-text

Analyzing injury severity of motorcycle at-fault crashes using machine learning techniques, decision tree and logistic regression models

International Journal of Transportation Science and Technology ◽

10.1016/j.ijtst.2019.10.002 ◽

2020 ◽

Vol 9 (2) ◽

pp. 89-99 ◽

Cited By ~ 7

Author(s):

Mahdi Rezapour ◽

Amirarsalan Mehrara Molan ◽

Khaled Ksaibati

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Decision Tree ◽

Regression Models ◽

Injury Severity ◽

Machine Learning Techniques ◽

Logistic Regression Models ◽

Learning Techniques

Download Full-text

Decision Support Credit Scoring Model to Improve Loan Default Prediction in Financial Institutions

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2019.8316 ◽

2019 ◽

Vol 16 (8) ◽

pp. 3514-3518

Author(s):

Kamya Eria ◽

Preethi Subramanian

Keyword(s):

Logistic Regression ◽

Credit Scoring ◽

Vital Role ◽

Quality Of Data ◽

Loan Default ◽

Scoring Model ◽

Credit Score ◽

Default Prediction ◽

Credit Scoring Model

Credit scoring plays a vital role in assessing the creditworthiness of loan applicants thus speeding up the approval process. Credit score models however rely on the accuracy of classification models for their performance. This accuracy performance depends not only on the choice of data mining process; it is heavily influenced by the quality of data as well. Although no techniques can be favored over the other, it has been evidenced that logistic regression has been widely employed as an industrial technique for its comprehensive simplicity. This study proposes a SEMMA-based credit scoring model developed with an improved Logistic Regression (LR) model. Improvements are by exclusion of irrelevant features and adjusting the partition ratios. The model has been compared with the predominant models and proved to contain outstanding results with minimal credit decision errors.

Download Full-text

Risk Factor Prediction by Naive Bayes Classifier, Logistic Regression Models, Various Classification and Regression Machine Learning Techniques

Proceedings of the National Academy of Sciences India Section B Biological Sciences ◽

10.1007/s40011-021-01278-3 ◽

2021 ◽

Author(s):

K. Kannan ◽

A. Menaga

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Risk Factor ◽

Regression Models ◽

Naive Bayes ◽

Machine Learning Techniques ◽

Bayes Classifier ◽

Logistic Regression Models ◽

Learning Techniques ◽

Classification And Regression

Download Full-text

Assessing damage severity of plant hopper and leaf folder in rice using hyperspectral remote sensing and multinomial logistic regression models

10.1603/ice.2016.93420 ◽

2016 ◽

Author(s):

Mathyam Prabhakar

Keyword(s):

Remote Sensing ◽

Logistic Regression ◽

Regression Models ◽

Multinomial Logistic Regression ◽

Hyperspectral Remote Sensing ◽

Logistic Regression Models

Download Full-text

The impact of SPY angiography on intraoperative decision making and outcomes for post-mastectomy reconstruction

Journal of Cancer Science and Therapeutics ◽

10.36879/jcst.19.000109 ◽

2019 ◽

pp. 1-5

Keyword(s):

Decision Making ◽

Logistic Regression ◽

Skin Necrosis ◽

Regression Models ◽

Multivariable Analysis ◽

P Value ◽

Flap Necrosis ◽

Single Institution ◽

Logistic Regression Models ◽

The Impact

Objective: While the use of intraoperative laser angiography (SPY) is increasing in mastectomy patients, its impact in the operating room to change the type of reconstruction performed has not been well described. The purpose of this study is to investigate whether SPY angiography influences post-mastectomy reconstruction decisions and outcomes. Methods and materials: A retrospective analysis of mastectomy patients with reconstruction at a single institution was performed from 2015-2017.All patients underwent intraoperative SPY after mastectomy but prior to reconstruction. SPY results were defined as ‘good’, ‘questionable’, ‘bad’, or ‘had skin excised’. Complications within 60 days of surgery were compared between those whose SPY results did not change the type of reconstruction done versus those who did. Preoperative and intraoperative variables were entered into multivariable logistic regression models if significant at the univariate level. A p-value <0.05 was considered significant. Results: 267 mastectomies were identified, 42 underwent a change in the type of planned reconstruction due to intraoperative SPY results. Of the 42 breasts that underwent a change in reconstruction, 6 had a ‘good’ SPY result, 10 ‘questionable’, 25 ‘bad’, and 2 ‘had areas excised’ (p<0.01). After multivariable analysis, predictors of skin necrosis included patients with ‘questionable’ SPY results (p<0.01, OR: 8.1, 95%CI: 2.06 – 32.2) and smokers (p<0.01, OR:5.7, 95%CI: 1.5 – 21.2). Predictors of any complication included a change in reconstruction (p<0.05, OR:4.5, 95%CI: 1.4-14.9) and ‘questionable’ SPY result (p<0.01, OR: 4.4, 95%CI: 1.6-14.9). Conclusion: SPY angiography results strongly influence intraoperative surgical decisions regarding the type of reconstruction performed. Patients most at risk for flap necrosis and complication post-mastectomy are those with questionable SPY results.

Download Full-text

Utilizing Twitter Data Analysis and Deep Learning to Identify Drug Use (Preprint)

10.2196/preprints.14681 ◽

2019 ◽

Author(s):

Joseph Tassone ◽

Peizhi Yan ◽

Mackenzie Simpson ◽

Chetan Mendhe ◽

Vijay Mago ◽

...

Keyword(s):

Social Media ◽

Logistic Regression ◽

Deep Learning ◽

Decision Tree ◽

Semantic Meaning ◽

Predictive Capability ◽

Logistic Regression Models ◽

Twitter Data ◽

Data Points ◽

Positive Classification

BACKGROUND The collection and examination of social media has become a useful mechanism for studying the mental activity and behavior tendencies of users. OBJECTIVE Through the analysis of a collected set of Twitter data, a model will be developed for predicting positively referenced, drug-related tweets. From this, trends and correlations can be determined. METHODS Twitter social media tweets and attribute data were collected and processed using topic pertaining keywords, such as drug slang and use-conditions (methods of drug consumption). Potential candidates were preprocessed resulting in a dataset 3,696,150 rows. The predictive classification power of multiple methods was compared including regression, decision trees, and CNN-based classifiers. For the latter, a deep learning approach was implemented to screen and analyze the semantic meaning of the tweets. RESULTS The logistic regression and decision tree models utilized 12,142 data points for training and 1041 data points for testing. The results calculated from the logistic regression models respectively displayed an accuracy of 54.56% and 57.44%, and an AUC of 0.58. While an improvement, the decision tree concluded with an accuracy of 63.40% and an AUC of 0.68. All these values implied a low predictive capability with little to no discrimination. Conversely, the CNN-based classifiers presented a heavy improvement, between the two models tested. The first was trained with 2,661 manually labeled samples, while the other included synthetically generated tweets culminating in 12,142 samples. The accuracy scores were 76.35% and 82.31%, with an AUC of 0.90 and 0.91. Using association rule mining in conjunction with the CNN-based classifier showed a high likelihood for keywords such as “smoke”, “cocaine”, and “marijuana” triggering a drug-positive classification. CONCLUSIONS Predictive analysis without a CNN is limited and possibly fruitless. Attribute-based models presented little predictive capability and were not suitable for analyzing this type of data. The semantic meaning of the tweets needed to be utilized, giving the CNN-based classifier an advantage over other solutions. Additionally, commonly mentioned drugs had a level of correspondence with frequently used illicit substances, proving the practical usefulness of this system. Lastly, the synthetically generated set provided increased scores, improving the predictive capability. CLINICALTRIAL None

Download Full-text