scholarly journals Analysis of Loan Availability using Machine Learning Techniques

Author(s):  
Sharayu Dosalwar ◽  
Ketki Kinkar ◽  
Rahul Sannat ◽  
Dr Nitin Pise

In the banking system, banks have a variety of products to provide, but credit lines are their primary source of revenue. As a result, they will profit from the interest earned on the loans they make. Loans, or whether customers repay or default on their loans, affect a bank's profit or loss. The bank's Non-Performing Assets will be reduced by forecasting loan defaulters. As a result, further investigation into this occurrence is essential. Because precise forecasts are essential for benefit maximisation, it's crucial to analyse and compare the various methodologies. The logistic regression model is an important predictive analytics tool for detecting loan defaulters. In order to assess and forecast, data from Kaggle is acquired. Logistic Regression models were used to calculate the various performance indicators. The models are compared using performance metrics like sensitivity and specificity. In addition to checking account details (which indicate a customer's wealth), the model is significantly better because it includes variables (customer personal attributes such as age, objective, credit score, credit amount, credit period, and so on) that should be considered when correctly calculating the probability of loan default. As a result, using a logistic regression approach, the appropriate clients to target for loan issuance can be easily identified by evaluating their plausibility of loan default. The model implies that a bank should assess a creditor's other attributes, which play a critical role in credit decisions and forecasting loan defaulters, in addition to giving loans to wealthy borrowers.

2021 ◽  
Author(s):  
Richard Rios ◽  
Elkin A. Noguera-Urbano ◽  
Jairo Espinosa ◽  
Jose Manuael Ochoa

Bioclimatic classifications seek to divide a study region into geographic areas with similar bioclimatic characteristics. In this study we proposed two bioclimatic classifications for Colombia using machine learning techniques. We firstly characterized the precipitation space of Colombia using principal component analysis. Based on Lang classification, we then projected all background sites in the precipitation space with their corresponding categories. We sequentially fit logistic regression models to re-classify all background sites in the precipitation space with six redefined Lang categories. New categories were the used to define a new modified Lang and Caldas-Lang classifications.


2022 ◽  
Vol 2161 (1) ◽  
pp. 012013
Author(s):  
Chiradeep Gupta ◽  
Athina Saha ◽  
N V Subba Reddy ◽  
U Dinesh Acharya

Abstract Diagnosis of cardiac disease requires being more accurate, precise, and reliable. The number of death cases due to cardiac attacks is increasing exponentially day by day. Thus, practical approaches for earlier diagnosis of cardiac or heart disease are done to achieve prompt management of the disease. Various supervised machine learning techniques like K-Nearest Neighbour, Decision Tree, Logistic Regression, Naïve Bayes, and Support Vector Machine (SVM) model are used for predicting cardiac disease using a dataset that was collected from the repository of the University of California, Irvine (UCI). The results depict that Logistic Regression was better than all other supervised classifiers in terms of the performance metrics. The model is also less risky since the number of false negatives is low as compared to other models as per the confusion matrix of all the models. In addition, ensemble techniques can be approached for the accuracy improvement of the classifier. Jupyter notebook is the best tool, for the implementation of Python Programming having many types of libraries, header files, for accurate and precise work.


2020 ◽  
Vol 12 (11) ◽  
pp. 187 ◽  
Author(s):  
Amgad Muneer ◽  
Suliman Mohamed Fati

The advent of social media, particularly Twitter, raises many issues due to a misunderstanding regarding the concept of freedom of speech. One of these issues is cyberbullying, which is a critical global issue that affects both individual victims and societies. Many attempts have been introduced in the literature to intervene in, prevent, or mitigate cyberbullying; however, because these attempts rely on the victims’ interactions, they are not practical. Therefore, detection of cyberbullying without the involvement of the victims is necessary. In this study, we attempted to explore this issue by compiling a global dataset of 37,373 unique tweets from Twitter. Moreover, seven machine learning classifiers were used, namely, Logistic Regression (LR), Light Gradient Boosting Machine (LGBM), Stochastic Gradient Descent (SGD), Random Forest (RF), AdaBoost (ADB), Naive Bayes (NB), and Support Vector Machine (SVM). Each of these algorithms was evaluated using accuracy, precision, recall, and F1 score as the performance metrics to determine the classifiers’ recognition rates applied to the global dataset. The experimental results show the superiority of LR, which achieved a median accuracy of around 90.57%. Among the classifiers, logistic regression achieved the best F1 score (0.928), SGD achieved the best precision (0.968), and SVM achieved the best recall (1.00).


2019 ◽  
Vol 16 (8) ◽  
pp. 3514-3518
Author(s):  
Kamya Eria ◽  
Preethi Subramanian

Credit scoring plays a vital role in assessing the creditworthiness of loan applicants thus speeding up the approval process. Credit score models however rely on the accuracy of classification models for their performance. This accuracy performance depends not only on the choice of data mining process; it is heavily influenced by the quality of data as well. Although no techniques can be favored over the other, it has been evidenced that logistic regression has been widely employed as an industrial technique for its comprehensive simplicity. This study proposes a SEMMA-based credit scoring model developed with an improved Logistic Regression (LR) model. Improvements are by exclusion of irrelevant features and adjusting the partition ratios. The model has been compared with the predominant models and proved to contain outstanding results with minimal credit decision errors.


Objective: While the use of intraoperative laser angiography (SPY) is increasing in mastectomy patients, its impact in the operating room to change the type of reconstruction performed has not been well described. The purpose of this study is to investigate whether SPY angiography influences post-mastectomy reconstruction decisions and outcomes. Methods and materials: A retrospective analysis of mastectomy patients with reconstruction at a single institution was performed from 2015-2017.All patients underwent intraoperative SPY after mastectomy but prior to reconstruction. SPY results were defined as ‘good’, ‘questionable’, ‘bad’, or ‘had skin excised’. Complications within 60 days of surgery were compared between those whose SPY results did not change the type of reconstruction done versus those who did. Preoperative and intraoperative variables were entered into multivariable logistic regression models if significant at the univariate level. A p-value <0.05 was considered significant. Results: 267 mastectomies were identified, 42 underwent a change in the type of planned reconstruction due to intraoperative SPY results. Of the 42 breasts that underwent a change in reconstruction, 6 had a ‘good’ SPY result, 10 ‘questionable’, 25 ‘bad’, and 2 ‘had areas excised’ (p<0.01). After multivariable analysis, predictors of skin necrosis included patients with ‘questionable’ SPY results (p<0.01, OR: 8.1, 95%CI: 2.06 – 32.2) and smokers (p<0.01, OR:5.7, 95%CI: 1.5 – 21.2). Predictors of any complication included a change in reconstruction (p<0.05, OR:4.5, 95%CI: 1.4-14.9) and ‘questionable’ SPY result (p<0.01, OR: 4.4, 95%CI: 1.6-14.9). Conclusion: SPY angiography results strongly influence intraoperative surgical decisions regarding the type of reconstruction performed. Patients most at risk for flap necrosis and complication post-mastectomy are those with questionable SPY results.


2019 ◽  
Author(s):  
Joseph Tassone ◽  
Peizhi Yan ◽  
Mackenzie Simpson ◽  
Chetan Mendhe ◽  
Vijay Mago ◽  
...  

BACKGROUND The collection and examination of social media has become a useful mechanism for studying the mental activity and behavior tendencies of users. OBJECTIVE Through the analysis of a collected set of Twitter data, a model will be developed for predicting positively referenced, drug-related tweets. From this, trends and correlations can be determined. METHODS Twitter social media tweets and attribute data were collected and processed using topic pertaining keywords, such as drug slang and use-conditions (methods of drug consumption). Potential candidates were preprocessed resulting in a dataset 3,696,150 rows. The predictive classification power of multiple methods was compared including regression, decision trees, and CNN-based classifiers. For the latter, a deep learning approach was implemented to screen and analyze the semantic meaning of the tweets. RESULTS The logistic regression and decision tree models utilized 12,142 data points for training and 1041 data points for testing. The results calculated from the logistic regression models respectively displayed an accuracy of 54.56% and 57.44%, and an AUC of 0.58. While an improvement, the decision tree concluded with an accuracy of 63.40% and an AUC of 0.68. All these values implied a low predictive capability with little to no discrimination. Conversely, the CNN-based classifiers presented a heavy improvement, between the two models tested. The first was trained with 2,661 manually labeled samples, while the other included synthetically generated tweets culminating in 12,142 samples. The accuracy scores were 76.35% and 82.31%, with an AUC of 0.90 and 0.91. Using association rule mining in conjunction with the CNN-based classifier showed a high likelihood for keywords such as “smoke”, “cocaine”, and “marijuana” triggering a drug-positive classification. CONCLUSIONS Predictive analysis without a CNN is limited and possibly fruitless. Attribute-based models presented little predictive capability and were not suitable for analyzing this type of data. The semantic meaning of the tweets needed to be utilized, giving the CNN-based classifier an advantage over other solutions. Additionally, commonly mentioned drugs had a level of correspondence with frequently used illicit substances, proving the practical usefulness of this system. Lastly, the synthetically generated set provided increased scores, improving the predictive capability. CLINICALTRIAL None


Sign in / Sign up

Export Citation Format

Share Document