Predicting Bank Operational Efficiency Using Machine Learning Algorithm: Comparative Study of Decision Tree, Random Forest, and Neural Networks

The financial crisis that hit Ghana from 2015 to 2018 has raised various issues with respect to the efficiency of banks and the safety of depositors’ in the banking industry. As part of measures to improve the banking sector and also restore customers’ confidence, efficiency and performance analysis in the banking industry has become a hot issue. This is because stakeholders have to detect the underlying causes of inefficiencies within the banking industry. Nonparametric methods such as Data Envelopment Analysis (DEA) have been suggested in the literature as a good measure of banks’ efficiency and performance. Machine learning algorithms have also been viewed as a good tool to estimate various nonparametric and nonlinear problems. This paper presents a combined DEA with three machine learning approaches in evaluating bank efficiency and performance using 444 Ghanaian bank branches, Decision Making Units (DMUs). The results were compared with the corresponding efficiency ratings obtained from the DEA. Finally, the prediction accuracies of the three machine learning algorithm models were compared. The results suggested that the decision tree (DT) and its C5.0 algorithm provided the best predictive model. It had 100% accuracy in predicting the 134 holdout sample dataset (30% banks) and a P value of 0.00. The DT was followed closely by random forest algorithm with a predictive accuracy of 98.5% and a P value of 0.00 and finally the neural network (86.6% accuracy) with a P value 0.66. The study concluded that banks in Ghana can use the result of this study to predict their respective efficiencies. All experiments were performed within a simulation environment and conducted in R studio using R codes.

Download Full-text

Machine learning algorithms can predict tail biting outbreaks in pigs using feeding behaviour records

10.1101/2021.05.11.443554 ◽

2021 ◽

Author(s):

Catherine Ollagnier ◽

Claudia Kasper ◽

Anna Wallenbeck ◽

Linda Keeling ◽

Siavash A Bigdeli

Keyword(s):

Machine Learning ◽

Random Forest ◽

Real Time ◽

Feeding Behaviour ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Machine Learning Algorithm ◽

Tail Biting ◽

Testing Set

Tail biting is a detrimental behaviour that impacts the welfare and health of pigs. Early detection of tail biting precursor signs allows for preventive measures to be taken, thus avoiding the occurrence of the tail biting event. This study aimed to build a machine-learning algorithm for real time detection of upcoming tail biting outbreaks, using feeding behaviour data recorded by an electronic feeder. Prediction capacities of seven machine learning algorithms (e.g., random forest, neural networks) were evaluated from daily feeding data collected from 65 pens originating from 2 herds of grower-finisher pigs (25-100kg), in which 27 tail biting events occurred. Data were divided into training and testing data, either by randomly splitting data into 75% (training set) and 25% (testing set), or by randomly selecting pens to constitute the testing set. The random forest algorithm was able to predict 70% of the upcoming events with an accuracy of 94%, when predicting events in pens for which it had previous data. The detection of events for unknown pens was less sensitive, and the neural network model was able to detect 14% of the upcoming events with an accuracy of 63%. A machine-learning algorithm based on ongoing data collection should be considered for implementation into automatic feeder systems for real time prediction of tail biting events.

Download Full-text

Application of machine learning algorithm for predicting gestational diabetes mellitus in early pregnancy†

Frontiers of Nursing ◽

10.2478/fon-2021-0022 ◽

2021 ◽

Vol 8 (3) ◽

pp. 209-221

Author(s):

Li-Li Wei ◽

Yue-Shuai Pan ◽

Yan Zhang ◽

Kai Chen ◽

Hao-Yu Wang ◽

...

Keyword(s):

Diabetes Mellitus ◽

Machine Learning ◽

Random Forest ◽

Prediction Model ◽

Predictive Accuracy ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Random Forest Algorithm ◽

Random Forest Regression ◽

Data Set

Abstract Objective To study the application of a machine learning algorithm for predicting gestational diabetes mellitus (GDM) in early pregnancy. Methods This study identified indicators related to GDM through a literature review and expert discussion. Pregnant women who had attended medical institutions for an antenatal examination from November 2017 to August 2018 were selected for analysis, and the collected indicators were retrospectively analyzed. Based on Python, the indicators were classified and modeled using a random forest regression algorithm, and the performance of the prediction model was analyzed. Results We obtained 4806 analyzable data from 1625 pregnant women. Among these, 3265 samples with all 67 indicators were used to establish data set F1; 4806 samples with 38 identical indicators were used to establish data set F2. Each of F1 and F2 was used for training the random forest algorithm. The overall predictive accuracy of the F1 model was 93.10%, area under the receiver operating characteristic curve (AUC) was 0.66, and the predictive accuracy of GDM-positive cases was 37.10%. The corresponding values for the F2 model were 88.70%, 0.87, and 79.44%. The results thus showed that the F2 prediction model performed better than the F1 model. To explore the impact of sacrificial indicators on GDM prediction, the F3 data set was established using 3265 samples (F1) with 38 indicators (F2). After training, the overall predictive accuracy of the F3 model was 91.60%, AUC was 0.58, and the predictive accuracy of positive cases was 15.85%. Conclusions In this study, a model for predicting GDM with several input variables (e.g., physical examination, past history, personal history, family history, and laboratory indicators) was established using a random forest regression algorithm. The trained prediction model exhibited a good performance and is valuable as a reference for predicting GDM in women at an early stage of pregnancy. In addition, there are certain requirements for the proportions of negative and positive cases in sample data sets when the random forest algorithm is applied to the early prediction of GDM.

Download Full-text

Deducing of Optimal Machine Learning Algorithms for Heterogeneity

10.36227/techrxiv.17162147 ◽

2021 ◽

Author(s):

Omar Alfarisi ◽

Zeyar Aung ◽

Mohamed Sassi

Keyword(s):

Machine Learning ◽

Random Forest ◽

Learning Algorithm ◽

Learning Algorithms ◽

Synthetic Data ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Machine Learning Algorithm ◽

Data Set ◽

Optimal Machine

For defining the optimal machine learning algorithm, the decision was not easy for which we shall choose. To help future researchers, we describe in this paper the optimal among the best of the algorithms. We built a synthetic data set and performed the supervised machine learning runs for five different algorithms. For heterogeneity, we identified Random Forest, among others, to be the best algorithm.

Download Full-text

Machine learning prediction on number of patient due to conjunctivitis based on air pollutants: A preliminary study

10.21203/rs.3.rs-52822/v1 ◽

2020 ◽

Author(s):

Juan Chen ◽

Yong-ran Cheng ◽

Zhan-hui Feng ◽

Meng-Yun Zhou ◽

Nan Wang ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Air Pollutants ◽

Learning Algorithm ◽

Support Vector ◽

Machine Learning Algorithm ◽

Adequate Treatment ◽

Number Of Patients

Abstract Background: Accurate prediction of the number of patients with conjunctivitis plays an important role in providing adequate treatment at the hospital, but such accurate predictive model currently does not exist. The current study sought to use machine learning (ML) prediction based on past patient for conjunctivitis and several air pollutants. The optimal machine learning prediction model was selected to predict conjunctivitis-related number patients.Methods: The average daily air pollutants concentrations (CO, O3, NO2, SO2, PM10, PM2.5) and weather data (highest and lowest temperature) were collected. Data were randomly divided into training dataset and test dataset, and normalized mean square error (NMSE) was calculated by 10 fold cross validation, comparing between the ability of seven ML methods to predict the number of patient due to conjunctivitis (Lasso penalized liner model, Decision tree, Boosting regression, Bagging regression, Random forest, Support vector, and Neural network). According to the accuracy of impact prediction, the important air and weather factors that affect conjunctivitis were identified.Results: A total of 84977 cases to treat conjunctivitis were obtained from the ophthalmology center of the Affiliated Hospital of Hangzhou Normal University. For all patients together, the NMSE of the different methods were as follows: Lasso penalized liner regression: 0.755, Decision tree: 0.710, Boosting regression: 0.616, Bagging regression: 0.615, Random forest: 0.392, Support vectors: 0.688, and Neural network: 0.476. Further analyses, stratified by gender and age at diagnosis, supported Random forest as being superior to others ML methods. The main factors affecting conjunctivitis were: O3, NO2, SO2 and air temperature.Conclusion: Machine learning algorithm can predict number of patients due to conjunctivitis, among which, the Random forest algorithm had the highest accuracy. Machine learning algorithm could provide accurate information for hospitals dealing with conjunctivitis caused by air factors.

Download Full-text

Machine learning prediction on number of patient due to conjunctivitis based on air pollutants: A preliminary study

10.21203/rs.3.rs-25509/v1 ◽

2020 ◽

Author(s):

Yong-ran Cheng ◽

Zhan-hui Feng ◽

Meng-Yun Zhou ◽

Nan Wang ◽

Ming-Wei Wang ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Air Pollutants ◽

Learning Algorithm ◽

Support Vector ◽

Machine Learning Algorithm ◽

Adequate Treatment ◽

Number Of Patients

Abstract Background Accurate prediction of the number of patients with conjunctivitis plays an important role in providing adequate treatment at the hospital, but such accurate predictive model currently does not exist. The current study sought to use machine learning (ML) prediction based on past patient for conjunctivitis and several air pollutants. The optimal machine learning prediction model was selected to predict conjunctivitis-related number patients. Methods The average daily air pollutants concentrations (CO, O3, NO2, SO2, PM10, PM2.5) and weather data (highest and lowest temperature) were collected. Data were randomly divided into training dataset and test dataset, and normalized mean square error (NMSE) was calculated by 10 fold cross validation, comparing between the ability of seven ML methods to predict the number of patient due to conjunctivitis (Lasso penalized liner model, Decision tree, Boosting regression, Bagging regression, Random forest, Support vector, and Neural network). According to the accuracy of impact prediction, the important air and weather factors that affect conjunctivitis were identified. Results A total of 84977 cases to treat conjunctivitis were obtained from the ophthalmology center of the Affiliated Hospital of Hangzhou Normal University. For all patients together, the NMSE of the different methods were as follows: Lasso penalized liner regression: 0.755, Decision tree: 0.710, Boosting regression: 0.616, Bagging regression: 0.615, Random forest: 0.392, Support vectors: 0.688, and Neural network: 0.476. Further analyses, stratified by gender and age at diagnosis, supported Random forest as being superior to others ML methods. The main factors affecting conjunctivitis were: O3, NO2, SO2 and air temperature. Conclusion Machine learning algorithm can predict number of patients due to conjunctivitis, among which, the Random forest algorithm had the highest accuracy. Machine learning algorithm could provide accurate information for hospitals dealing with conjunctivitis caused by air factors.

Download Full-text

Deducing of Optimal Machine Learning Algorithms for Heterogeneity

10.36227/techrxiv.17162147.v1 ◽

2021 ◽

Author(s):

Omar Alfarisi ◽

Zeyar Aung ◽

Mohamed Sassi

Keyword(s):

Machine Learning ◽

Random Forest ◽

Learning Algorithm ◽

Learning Algorithms ◽

Synthetic Data ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Machine Learning Algorithm ◽

Data Set ◽

Optimal Machine

Download Full-text

A survey on prediction of diabetes using classification algorithms

Journal of Achievements of Materials and Manufacturing Engineering ◽

10.5604/01.3001.0014.8490 ◽

2021 ◽

Vol 2 (104) ◽

pp. 77-84

Author(s):

A. Khanwalkar ◽

R. Soni

Keyword(s):

Machine Learning ◽

Data Collection ◽

Learning Algorithm ◽

Algorithm Design ◽

Machine Learning Algorithms ◽

Classification Algorithms ◽

Machine Learning Algorithm ◽

Collection Method ◽

Data Collection Method ◽

Diagnostic Center

Purpose: Diabetes is a chronic disease that pays for a large proportion of the nation's healthcare expenses when people with diabetes want medical care continuously. Several complications will occur if the polymer disorder is not treated and unrecognizable. The prescribed condition leads to a diagnostic center and a doctor's intention. One of the real-world subjects essential is to find the first phase of the polytechnic. In this work, basically a survey that has been analyzed in several parameters within the poly-infected disorder diagnosis. It resembles the classification algorithms of data collection that plays an important role in the data collection method. Automation of polygenic disorder analysis, as well as another machine learning algorithm. Design/methodology/approach: This paper provides extensive surveys of different analogies which have been used for the analysis of medical data, For the purpose of early detection of polygenic disorder. This paper takes into consideration methods such as J48, CART, SVMs and KNN square, this paper also conducts a formal surveying of all the studies, and provides a conclusion at the end. Findings: This surveying has been analyzed on several parameters within the poly-infected disorder diagnosis. It resembles that the classification algorithms of data collection plays an important role in the data collection method in Automation of polygenic disorder analysis, as well as another machine learning algorithm. Practical implications: This paper will help future researchers in the field of Healthcare, specifically in the domain of diabetes, to understand differences between classification algorithms. Originality/value: This paper will help in comparing machine learning algorithms by going through results and selecting the appropriate approach based on requirements.

Download Full-text

Land subsidence susceptibility assessment using random forest machine learning algorithm

Environmental Earth Sciences ◽

10.1007/s12665-019-8518-3 ◽

2019 ◽

Vol 78 (16) ◽

Cited By ~ 12

Author(s):

Majid Mohammady ◽

Hamid Reza Pourghasemi ◽

Mojtaba Amiri

Keyword(s):

Machine Learning ◽

Random Forest ◽

Land Subsidence ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Susceptibility Assessment

Download Full-text

Modified Decision Tree Technique for Ransomware Detection at Runtime through API Calls

Scientific Programming ◽

10.1155/2020/8845833 ◽

2020 ◽

Vol 2020 ◽

pp. 1-10

Author(s):

Faizan Ullah ◽

Qaisar Javaid ◽

Abdu Salam ◽

Masood Ahmad ◽

Nadeem Sarwar ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Feature Vector ◽

Machine Learning Algorithms ◽

The Novel ◽

Proposed Model ◽

Testing Accuracy ◽

Financial Losses

Ransomware (RW) is a distinctive variety of malware that encrypts the files or locks the user’s system by keeping and taking their files hostage, which leads to huge financial losses to users. In this article, we propose a new model that extracts the novel features from the RW dataset and performs classification of the RW and benign files. The proposed model can detect a large number of RW from various families at runtime and scan the network, registry activities, and file system throughout the execution. API-call series was reutilized to represent the behavior-based features of RW. The technique extracts fourteen-feature vector at runtime and analyzes it by applying online machine learning algorithms to predict the RW. To validate the effectiveness and scalability, we test 78550 recent malign and benign RW and compare with the random forest and AdaBoost, and the testing accuracy is extended at 99.56%.

Download Full-text

FLOOD MAPPING USING RANDOM FOREST AND IDENTIFYING THE ESSENTIAL CONDITIONING FACTORS; A CASE STUDY IN FREDERICTON, NEW BRUNSWICK, CANADA

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-v-3-2020-609-2020 ◽

2020 ◽

Vol V-3-2020 ◽

pp. 609-615 ◽

Cited By ~ 1

Author(s):

M. Esfandiari ◽

S. Jabari ◽

H. McGrath ◽

D. Coleman

Keyword(s):

Machine Learning ◽

Random Forest ◽

New Brunswick ◽

Urban Areas ◽

Learning Algorithm ◽

Satellite Image ◽

Machine Learning Algorithms ◽

Slope Aspect ◽

Flood Peak ◽

Conditioning Factors

Abstract. Flood is one of the most damaging natural hazards in urban areas in many places around the world as well as the city of Fredericton, New Brunswick, Canada. Recently, Fredericton has been flooded in two consecutive years in 2018 and 2019. Due to the complicated behaviour of water when a river overflows its bank, estimating the flood extent is challenging. The issue gets even more challenging when several different factors are affecting the water flow, like the land texture or the surface flatness, with varying degrees of intensity. Recently, machine learning algorithms and statistical methods are being used in many research studies for generating flood susceptibility maps using topographical, hydrological, and geological conditioning factors. One of the major issues that researchers have been facing is the complexity and the number of features required to input in a machine-learning algorithm to produce acceptable results. In this research, we used Random Forest to model the 2018 flood in Fredericton and analyzed the effect of several combinations of 12 different flood conditioning factors. The factors were tested against a Sentinel-2 optical satellite image available around the flood peak day. The highest accuracy was obtained using only 5 factors namely, altitude, slope, aspect, distance from the river, and land-use/cover with 97.57% overall accuracy and 95.14% kappa coefficient.

Download Full-text