Credit Risk Assessment of Loan Defaulters in Commercial Banks Using Voting Classifier Ensemble Learner Machine Learning Model

In banking sector credit score plays a very important factor. It is important to find which customer is valid and which is not valid for loan. Now to classify customer’s credit score is used. Based on this credit score of customers the bank will decide whether to approve loan or not. In banks there are major failures due to credit risks. We can automate this by using various Machine learning algorithms to identify loan defaulters. To classify and predict the customers here various Machine learning techniques like gradient boosting, random forest and Feature Selection technique along with Decision Tree are used. Using these algorithms we accurately classify valid and invalid customers for loan. Designed model can classify their customers into good and bad applicants and train the model for getting the better accuracy of the customer data.

Download Full-text

Feasibility of Machine Learning Algorithms for Predicting the Deformation of Anodic Titanium Films by Modulating Anodization Processes

Materials ◽

10.3390/ma14051089 ◽

2021 ◽

Vol 14 (5) ◽

pp. 1089

Author(s):

Sung-Hee Kim ◽

Chanyoung Jeong

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Multiclass Classification ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Smart Manufacturing ◽

Gradient Boosting ◽

Experimental Conditions ◽

Learning Techniques ◽

Tio2 Nanostructures

This study aims to demonstrate the feasibility of applying eight machine learning algorithms to predict the classification of the surface characteristics of titanium oxide (TiO2) nanostructures with different anodization processes. We produced a total of 100 samples, and we assessed changes in TiO2 nanostructures’ thicknesses by performing anodization. We successfully grew TiO2 films with different thicknesses by one-step anodization in ethylene glycol containing NH4F and H2O at applied voltage differences ranging from 10 V to 100 V at various anodization durations. We found that the thicknesses of TiO2 nanostructures are dependent on anodization voltages under time differences. Therefore, we tested the feasibility of applying machine learning algorithms to predict the deformation of TiO2. As the characteristics of TiO2 changed based on the different experimental conditions, we classified its surface pore structure into two categories and four groups. For the classification based on granularity, we assessed layer creation, roughness, pore creation, and pore height. We applied eight machine learning techniques to predict classification for binary and multiclass classification. For binary classification, random forest and gradient boosting algorithm had relatively high performance. However, all eight algorithms had scores higher than 0.93, which signifies high prediction on estimating the presence of pore. In contrast, decision tree and three ensemble methods had a relatively higher performance for multiclass classification, with an accuracy rate greater than 0.79. The weakest algorithm used was k-nearest neighbors for both binary and multiclass classifications. We believe that these results show that we can apply machine learning techniques to predict surface quality improvement, leading to smart manufacturing technology to better control color appearance, super-hydrophobicity, super-hydrophilicity or batter efficiency.

Download Full-text

Detecting Cognitive Impairment Status Using Keystroke Patterns and Physical Activity Data among the Older Adults: A Machine Learning Approach

Journal of Healthcare Engineering ◽

10.1155/2021/1302989 ◽

2021 ◽

Vol 2021 ◽

pp. 1-16

Author(s):

Mohammad Nahid Hossain ◽

Mohammad Helal Uddin ◽

K. Thapa ◽

Md Abdullah Al Zubaer ◽

Md Shafiqul Islam ◽

...

Keyword(s):

Machine Learning ◽

Older Adults ◽

Cognitive Impairment ◽

Learning Algorithm ◽

Negative Impact ◽

Gradient Boosting ◽

Feature Selection Technique ◽

Activity Data ◽

Physical Data ◽

Machine Learning Model

Cognitive impairment has a significantly negative impact on global healthcare and the community. Holding a person’s cognition and mental retention among older adults is improbable with aging. Early detection of cognitive impairment will decline the most significant impact of extended disease to permanent mental damage. This paper aims to develop a machine learning model to detect and differentiate cognitive impairment categories like severe, moderate, mild, and normal by analyzing neurophysical and physical data. Keystroke and smartwatch have been used to extract individuals’ neurophysical and physical data, respectively. An advanced ensemble learning algorithm named Gradient Boosting Machine (GBM) is proposed to classify the cognitive severity level (absence, mild, moderate, and severe) based on the Standardised Mini-Mental State Examination (SMMSE) questionnaire scores. The statistical method “Pearson’s correlation” and the wrapper feature selection technique have been used to analyze and select the best features. Then, we have conducted our proposed algorithm GBM on those features. And the result has shown an accuracy of more than 94%. This paper has added a new dimension to the state-of-the-art to predict cognitive impairment by implementing neurophysical data and physical data together.

Download Full-text

Supermarket Sales Prediction Using Regression

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2021/951022021 ◽

2021 ◽

Vol 10 (2) ◽

pp. 1153-1157

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Low Cost ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Customer Data ◽

Sales Data ◽

Online Marketplace ◽

Sales Prediction ◽

The Future

Sales forecasting is an important when it comes to companies who are engaged in retailing, logistics, manufacturing, marketing and wholesaling. It allows companies to allocate resources efficiently, to estimate revenue of the sales and to plan strategies which are better for company’s future. In this paper, predicting product sales from a particular store is done in a way that produces better performance compared to any machine learning algorithms. The dataset used for this project is Big Mart Sales data of the 2013.Nowadays shopping malls and Supermarkets keep track of the sales data of the each and every individual item for predicting the future demand of the customer. It contains large amount of customer data and the item attributes. Further, the frequent patterns are detected by mining the data from the data warehouse. Then the data can be used for predicting the sales of the future with the help of several machine learning techniques (algorithms) for the companies like Big Mart. In this project, we propose a model using the Xgboost algorithm for predicting sales of companies like Big Mart and founded that it produces better performance compared to other existing models. An analysis of this model with other models in terms of their performance metrics is made in this project. Big Mart is an online marketplace where people can buy or sell or advertise your merchandise at low cost. The goal of the paper is to make Big Mart the shopping paradise for the buyers and a marketing solutions for the sellers as well. The ultimate aim is the complete satisfaction of the customers. The project “SUPERMARKET SALES PREDICTION” builds a predictive model and finds out the sales of each of the product at a particular store. The Big Mart use this model to under the properties of the products which plays a major role in increasing the sales. This can also be done on the basis hypothesis that should be done before looking at the data

Download Full-text

Predicting Forest Fires using Supervised and Ensemble Machine Learning Algorithms

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b2878.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 3697-3705 ◽

Cited By ~ 1

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Forest Fires ◽

Principal Component ◽

Climatic Conditions ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Physical Factors

Forest fires have become one of the most frequently occurring disasters in recent years. The effects of forest fires have a lasting impact on the environment as it lead to deforestation and global warming, which is also one of its major cause of occurrence. Forest fires are dealt by collecting the satellite images of forest and if there is any emergency caused by the fires then the authorities are notified to mitigate its effects. By the time the authorities get to know about it, the fires would have already caused a lot of damage. Data mining and machine learning techniques can provide an efficient prevention approach where data associated with forests can be used for predicting the eventuality of forest fires. This paper uses the dataset present in the UCI machine learning repository which consists of physical factors and climatic conditions of the Montesinho park situated in Portugal. Various algorithms like Logistic regression, Support Vector Machine, Random forest, K-Nearest neighbors in addition to Bagging and Boosting predictors are used, both with and without Principal Component Analysis (PCA). Among the models in which PCA was applied, Logistic Regression gave the highest F-1 score of 68.26 and among the models where PCA was absent, Gradient boosting gave the highest score of 68.36.

Download Full-text

Machine learning techniques to predict daily rainfall amount

Journal Of Big Data ◽

10.1186/s40537-021-00545-4 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Chalachew Muluken Liyew ◽

Haileyesus Amsaya Melese

Keyword(s):

Machine Learning ◽

Pearson Correlation ◽

Daily Rainfall ◽

Learning Model ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Correlation Technique ◽

Learning Techniques ◽

Machine Learning Model ◽

Extreme Gradient Boosting

AbstractPredicting the amount of daily rainfall improves agricultural productivity and secures food and water supply to keep citizens healthy. To predict rainfall, several types of research have been conducted using data mining and machine learning techniques of different countries’ environmental datasets. An erratic rainfall distribution in the country affects the agriculture on which the economy of the country depends on. Wise use of rainfall water should be planned and practiced in the country to minimize the problem of the drought and flood occurred in the country. The main objective of this study is to identify the relevant atmospheric features that cause rainfall and predict the intensity of daily rainfall using machine learning techniques. The Pearson correlation technique was used to select relevant environmental variables which were used as an input for the machine learning model. The dataset was collected from the local meteorological office at Bahir Dar City, Ethiopia to measure the performance of three machine learning techniques (Multivariate Linear Regression, Random Forest, and Extreme Gradient Boost). Root mean squared error and Mean absolute Error methods were used to measure the performance of the machine learning model. The result of the study revealed that the Extreme Gradient Boosting machine learning algorithm performed better than others.

Download Full-text

Clinical Data Analysis for Prediction of Cardiovascular Disease Using Machine Learning Techniques

Computational Intelligence and Neuroscience ◽

10.1155/2022/2973324 ◽

2022 ◽

Vol 2022 ◽

pp. 1-13

Author(s):

Rajkumar Gangappa Nadakinamani ◽

A. Reyana ◽

Sandeep Kautish ◽

A. S. Vibith ◽

Yogita Gupta ◽

...

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Cardiac Risk ◽

Machine Learning Algorithms ◽

Random Tree ◽

Machine Learning Techniques ◽

Learning Technology ◽

Tree Model ◽

Learning Techniques ◽

Machine Learning Model

Cardiovascular disease is difficult to detect due to several risk factors, including high blood pressure, cholesterol, and an abnormal pulse rate. Accurate decision-making and optimal treatment are required to address cardiac risk. As machine learning technology advances, the healthcare industry’s clinical practice is likely to change. As a result, researchers and clinicians must recognize the importance of machine learning techniques. The main objective of this research is to recommend a machine learning-based cardiovascular disease prediction system that is highly accurate. In contrast, modern machine learning algorithms such as REP Tree, M5P Tree, Random Tree, Linear Regression, Naive Bayes, J48, and JRIP are used to classify popular cardiovascular datasets. The proposed CDPS’s performance was evaluated using a variety of metrics to identify the best suitable machine learning model. When it came to predicting cardiovascular disease patients, the Random Tree model performed admirably, with the highest accuracy of 100%, the lowest MAE of 0.0011, the lowest RMSE of 0.0231, and the fastest prediction time of 0.01 seconds.

Download Full-text

A Highly Sensitive Pressure-Sensing Array for Blood Pressure Estimation Assisted by Machine-Learning Techniques

Sensors ◽

10.3390/s19040848 ◽

2019 ◽

Vol 19 (4) ◽

pp. 848 ◽

Cited By ~ 5

Author(s):

Kuan-Hua Huang ◽

Fu Tan ◽

Tzung-Dau Wang ◽

Yao-Joe Yang

Keyword(s):

Machine Learning ◽

Blood Pressure ◽

Polymer Film ◽

Pulse Wave ◽

Conductive Polymer ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Pressure Sensing ◽

Adaptive Boosting

This work describes the development of a pressure-sensing array for noninvasive continuous blood pulse-wave monitoring. The sensing elements comprise a conductive polymer film and interdigital electrodes patterned on a flexible Parylene C substrate. The polymer film was patterned with microdome structures to enhance the acuteness of pressure sensing. The proposed device uses three pressure-sensing elements in a linear array, which greatly facilitates the blood pulse-wave measurement. The device exhibits high sensitivity (−0.533 kPa−1) and a fast dynamic response. Furthermore, various machine-learning algorithms, including random forest regression (RFR), gradient-boosting regression (GBR), and adaptive boosting regression (ABR), were employed for estimating systolic blood pressure (SBP) and diastolic blood pressure (DBP) from the measured pulse-wave signals. Among these algorithms, the RFR-based method gave the best performance, with the coefficients of determination for the reference and estimated blood pressures being R2 = 0.871 for SBP and R2 = 0.794 for DBP, respectively.

Download Full-text

Improving Sports Outcome Prediction Process Using Integrating Adaptive Weighted Features and Machine Learning Techniques

Processes ◽

10.3390/pr9091563 ◽

2021 ◽

Vol 9 (9) ◽

pp. 1563

Author(s):

Chi-Jie Lu ◽

Tian-Shyug Lee ◽

Chien-Chih Wang ◽

Wei-Jen Chen

Keyword(s):

Machine Learning ◽

Outcome Prediction ◽

Prediction Models ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Adaptive Weighting ◽

Stochastic Gradient Boosting ◽

Basketball Game ◽

Extreme Gradient Boosting

Developing an effective sports performance analysis process is an attractive issue in sports team management. This study proposed an improved sports outcome prediction process by integrating adaptive weighted features and machine learning algorithms for basketball game score prediction. The feature engineering method is used to construct designed features based on game-lag information and adaptive weighting of variables in the proposed prediction process. These designed features are then applied to the five machine learning methods, including classification and regression trees (CART), random forest (RF), stochastic gradient boosting (SGB), eXtreme gradient boosting (XGBoost), and extreme learning machine (ELM) for constructing effective prediction models. The empirical results from National Basketball Association (NBA) data revealed that the proposed sports outcome prediction process could generate a promising prediction result compared to the competing models without adaptive weighting features. Our results also showed that the machine learning models with four game-lags information and adaptive weighting of power could generate better prediction performance.

Download Full-text

Prediction of Mean Wave Overtopping Discharge Using Gradient Boosting Decision Trees

Water ◽

10.3390/w12061703 ◽

2020 ◽

Vol 12 (6) ◽

pp. 1703 ◽

Cited By ~ 3

Author(s):

Joost P. den Bieman ◽

Josefine M. Wilms ◽

Henk F. P. van den Boogaard ◽

Marcel R. A. van Gent

Keyword(s):

Machine Learning ◽

Decision Trees ◽

Numerical Models ◽

Input Parameter ◽

Design Criterion ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Wave Overtopping ◽

Learning Techniques ◽

Machine Learning Model

Wave overtopping is an important design criterion for coastal structures such as dikes, breakwaters and promenades. Hence, the prediction of the expected wave overtopping discharge is an important research topic. Existing prediction tools consist of empirical overtopping formulae, machine learning techniques like neural networks, and numerical models. In this paper, an innovative machine learning method—gradient boosting decision trees—is applied to the prediction of mean wave overtopping discharges. This new machine learning model is trained using the CLASH wave overtopping database. Optimizations to its performance are realized by using feature engineering and hyperparameter tuning. The model is shown to outperform an existing neural network model by reducing the error on the prediction of the CLASH database by a factor of 2.8. The model predictions follow physically realistic trends for variations of important features, and behave regularly in regions of the input parameter space with little or no data coverage.

Download Full-text

Machine learning associated with respiratory oscillometry: a computer-aided diagnosis system for the detection of respiratory abnormalities in systemic sclerosis

BioMedical Engineering OnLine ◽

10.1186/s12938-021-00865-9 ◽

2021 ◽

Vol 20 (1) ◽

Author(s):

Domingos S. M. Andrade ◽

Luigi Maciel Ribeiro ◽

Agnaldo J. Lopes ◽

Jorge L. M. Amaral ◽

Pedro L. Melo

Keyword(s):

Machine Learning ◽

Systemic Sclerosis ◽

Diagnostic Accuracy ◽

Group Versus ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Control Group ◽

Extreme Gradient Boosting

Abstract Introduction The use of machine learning (ML) methods would improve the diagnosis of respiratory changes in systemic sclerosis (SSc). This paper evaluates the performance of several ML algorithms associated with the respiratory oscillometry analysis to aid in the diagnostic of respiratory changes in SSc. We also find out the best configuration for this task. Methods Oscillometric and spirometric exams were performed in 82 individuals, including controls (n = 30) and patients with systemic sclerosis with normal (n = 22) and abnormal (n = 30) spirometry. Multiple instance classifiers and different supervised machine learning techniques were investigated, including k-Nearest Neighbors (KNN), Random Forests (RF), AdaBoost with decision trees (ADAB), and Extreme Gradient Boosting (XGB). Results and discussion The first experiment of this study showed that the best oscillometric parameter (BOP) was dynamic compliance, which provided moderate accuracy (AUC = 0.77) in the scenario control group versus patients with sclerosis and normal spirometry (CGvsPSNS). In the scenario control group versus patients with sclerosis and altered spirometry (CGvsPSAS), the BOP obtained high accuracy (AUC = 0.94). In the second experiment, the ML techniques were used. In CGvsPSNS, KNN achieved the best result (AUC = 0.90), significantly improving the accuracy in comparison with the BOP (p < 0.01), while in CGvsPSAS, RF obtained the best results (AUC = 0.97), also significantly improving the diagnostic accuracy (p < 0.05). In the third, fourth, fifth, and sixth experiments, different feature selection techniques allowed us to spot the best oscillometric parameters. They resulted in a small increase in diagnostic accuracy in CGvsPSNS (respectively, 0.87, 0.86, 0.82, and 0.84), while in the CGvsPSAS, the best classifier's performance remained the same (AUC = 0.97). Conclusions Oscillometric principles combined with machine learning algorithms provide a new method for diagnosing respiratory changes in patients with systemic sclerosis. The present study's findings provide evidence that this combination may help in the early diagnosis of respiratory changes in these patients.

Download Full-text