Predicting Loan Approval of Bank Direct Marketing Data Using Ensemble Machine Learning Algorithms

International Journal of Circuits, Systems and Signal Processing ◽

10.46300/9106.2020.14.117 ◽

2020 ◽

Vol 14 ◽

Keyword(s):

Machine Learning ◽

Prediction Model ◽

Prediction Models ◽

Machine Learning Algorithms ◽

Decision Makers ◽

Machine Learning Techniques ◽

Data Set ◽

Ensemble Machine Learning ◽

Marketing Data ◽

Loan Approval

The Bank Marketing data set at Kaggle is mostly used in predicting if bank clients will subscribe a long-term deposit. We believe that this data set could provide more useful information such as predicting whether a bank client could be approved for a loan. This is a critical choice that has to be made by decision makers at the bank. Building a prediction model for such high-stakes decision does not only require high model prediction accuracy, but also needs a reasonable prediction interpretation. In this research, different ensemble machine learning techniques have been deployed such as Bagging and Boosting. Our research results showed that the loan approval prediction model has an accuracy of 83.97%, which is approximately 25% better than most state-of-the-art other loan prediction models found in the literature. As well, the model interpretation efforts done in this research was able to explain a few critical cases that the bank decision makers may encounter; therefore, the high accuracy of the designed models was accompanied with a trust in prediction. We believe that the achieved model accuracy accompanied with the provided interpretation information are vitally needed for decision makers to understand how to maintain balance between security and reliability of their financial lending system, while providing fair credit opportunities to their clients.

Prediction of Cardiac Arrest in the Emergency Department Based on Machine Learning and Sequential Characteristics: Model Development and Retrospective Clinical Validation Study (Preprint)

10.2196/preprints.15932 ◽

2019 ◽

Author(s):

Sungjun Hong ◽

Sungjoo Lee ◽

Jeonghoon Lee ◽

Won Chul Cha ◽

Kyunga Kim

Keyword(s):

Machine Learning ◽

Emergency Department ◽

Cardiac Arrest ◽

Prediction Model ◽

Prediction Models ◽

Characteristic Curve ◽

Machine Learning Algorithms ◽

Clinical Usefulness ◽

Class Imbalance Problem ◽

Data Set

BACKGROUND The development and application of clinical prediction models using machine learning in clinical decision support systems is attracting increasing attention. OBJECTIVE The aims of this study were to develop a prediction model for cardiac arrest in the emergency department (ED) using machine learning and sequential characteristics and to validate its clinical usefulness. METHODS This retrospective study was conducted with ED patients at a tertiary academic hospital who suffered cardiac arrest. To resolve the class imbalance problem, sampling was performed using propensity score matching. The data set was chronologically allocated to a development cohort (years 2013 to 2016) and a validation cohort (year 2017). We trained three machine learning algorithms with repeated 10-fold cross-validation. RESULTS The main performance parameters were the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC). The random forest algorithm (AUROC 0.97; AUPRC 0.86) outperformed the recurrent neural network (AUROC 0.95; AUPRC 0.82) and the logistic regression algorithm (AUROC 0.92; AUPRC=0.72). The performance of the model was maintained over time, with the AUROC remaining at least 80% across the monitored time points during the 24 hours before event occurrence. CONCLUSIONS We developed a prediction model of cardiac arrest in the ED using machine learning and sequential characteristics. The model was validated for clinical usefulness by chronological visualization focused on clinical usability.

Prediction of Cardiac Arrest in the Emergency Department Based on Machine Learning and Sequential Characteristics: Model Development and Retrospective Clinical Validation Study

JMIR Medical Informatics ◽

10.2196/15932 ◽

2020 ◽

Vol 8 (8) ◽

pp. e15932

Author(s):

Sungjun Hong ◽

Sungjoo Lee ◽

Jeonghoon Lee ◽

Won Chul Cha ◽

Kyunga Kim

Keyword(s):

Machine Learning ◽

Emergency Department ◽

Cardiac Arrest ◽

Prediction Model ◽

Prediction Models ◽

Characteristic Curve ◽

Machine Learning Algorithms ◽

Clinical Usefulness ◽

Class Imbalance Problem ◽

Data Set

Background The development and application of clinical prediction models using machine learning in clinical decision support systems is attracting increasing attention. Objective The aims of this study were to develop a prediction model for cardiac arrest in the emergency department (ED) using machine learning and sequential characteristics and to validate its clinical usefulness. Methods This retrospective study was conducted with ED patients at a tertiary academic hospital who suffered cardiac arrest. To resolve the class imbalance problem, sampling was performed using propensity score matching. The data set was chronologically allocated to a development cohort (years 2013 to 2016) and a validation cohort (year 2017). We trained three machine learning algorithms with repeated 10-fold cross-validation. Results The main performance parameters were the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC). The random forest algorithm (AUROC 0.97; AUPRC 0.86) outperformed the recurrent neural network (AUROC 0.95; AUPRC 0.82) and the logistic regression algorithm (AUROC 0.92; AUPRC=0.72). The performance of the model was maintained over time, with the AUROC remaining at least 80% across the monitored time points during the 24 hours before event occurrence. Conclusions We developed a prediction model of cardiac arrest in the ED using machine learning and sequential characteristics. The model was validated for clinical usefulness by chronological visualization focused on clinical usability.

Development of Heavy Rain Damage Prediction Model Using Machine Learning Based on Big Data

Advances in Meteorology ◽

10.1155/2018/5024930 ◽

2018 ◽

Vol 2018 ◽

pp. 1-11 ◽

Cited By ~ 12

Author(s):

Changhyun Choi ◽

Jeonghwan Kim ◽

Jongsung Kim ◽

Donghyun Kim ◽

Younghye Bae ◽

...

Keyword(s):

Machine Learning ◽

Big Data ◽

Prediction Model ◽

Prediction Models ◽

Meteorological Data ◽

Heavy Rain ◽

Machine Learning Techniques ◽

Damage Prediction ◽

Explanatory Variables ◽

The Republic

Prediction models of heavy rain damage using machine learning based on big data were developed for the Seoul Capital Area in the Republic of Korea. We used data on the occurrence of heavy rain damage from 1994 to 2015 as dependent variables and weather big data as explanatory variables. The model was developed by applying machine learning techniques such as decision trees, bagging, random forests, and boosting. As a result of evaluating the prediction performance of each model, the AUC value of the boosting model using meteorological data from the past 1 to 4 days was the highest at 95.87% and was selected as the final model. By using the prediction model developed in this study to predict the occurrence of heavy rain damage for each administrative region, we can greatly reduce the damage through proactive disaster management.

Machine Learning-Based Prediction Model for Papillary Thyroid Carcinoma Recurrence

10.21203/rs.3.rs-113105/v1 ◽

2020 ◽

Author(s):

Young Min Park ◽

Byung-Joo Lee

Keyword(s):

Machine Learning ◽

Prediction Model ◽

Tumor Size ◽

Large Scale ◽

Prediction Models ◽

Prognostic Significance ◽

Disease Recurrence ◽

Machine Learning Techniques ◽

Papillary Thyroid ◽

Recurrence Prediction

Abstract Background: This study analyzed the prognostic significance of nodal factors, including the number of metastatic LNs and LNR, in patients with PTC, and attempted to construct a disease recurrence prediction model using machine learning techniques.Methods: We retrospectively analyzed clinico-pathologic data from 1040 patients diagnosed with papillary thyroid cancer between 2003 and 2009. Results: We analyzed clinico-pathologic factors related to recurrence through logistic regression analysis. Among the factors that we included, only sex and tumor size were significantly correlated with disease recurrence. Parameters such as age, sex, tumor size, tumor multiplicity, ETE, ENE, pT, pN, ipsilateral central LN metastasis, contralateral central LNs metastasis, number of metastatic LNs, and LNR were input for construction of a machine learning prediction model. The performance of five machine learning models related to recurrence prediction was compared based on accuracy. The Decision Tree model showed the best accuracy at 95%, and the lightGBM and stacking model together showed 93% accuracy. Conclusions: We confirmed that all machine learning prediction models showed an accuracy of 90% or more for predicting disease recurrence in PTC. Large-scale multicenter clinical studies should be performed to improve the performance of our prediction models and verify their clinical effectiveness.

Determination of Significant Features for Building an Efficient Heart Disease Prediction System

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b3393.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 4499-4504

Keyword(s):

Machine Learning ◽

Heart Disease ◽

Prediction Model ◽

Prediction Models ◽

Heart Diseases ◽

Medical Diagnostics ◽

Medical Data ◽

Machine Learning Algorithms ◽

Prediction System ◽

Early Stages

Heart diseases are responsible for the greatest number of deaths all over the world. These diseases are usually not detected in early stages as the cost of medical diagnostics is not affordable by a majority of the people. Research has shown that machine learning methods have a great capability to extract valuable information from the medical data. This information is used to build the prediction models which provide cost effective technological aid for a medical practitioner to detect the heart disease in early stages. However, the presence of some irrelevant and redundant features in medical data deteriorates the competence of the prediction system. This research was aimed to improve the accuracy of the existing methods by removing such features. In this study, brute force-based algorithm of feature selection was used to determine relevant significant features. After experimenting rigorously with 7528 possible combinations of features and 5 machine learning algorithms, 8 important features were identified. A prediction model was developed using these significant features. Accuracy of this model is experimentally calculated to be 86.4%which is higher than the results of existing studies. The prediction model proposed in this study shall help in predicting heart disease efficiently.

Software Maintainability: Systematic Literature Review and Current Trends

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194016500431 ◽

2016 ◽

Vol 26 (08) ◽

pp. 1221-1253 ◽

Cited By ~ 16

Author(s):

Ruchika Malhotra ◽

Anuradha Chug

Keyword(s):

Machine Learning ◽

Prediction Model ◽

Evaluation System ◽

Prediction Models ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Design Metrics ◽

Current Trends ◽

Software Maintainability ◽

Early Phases

Software maintenance is an expensive activity that consumes a major portion of the cost of the total project. Various activities carried out during maintenance include the addition of new features, deletion of obsolete code, correction of errors, etc. Software maintainability means the ease with which these operations can be carried out. If the maintainability can be measured in early phases of the software development, it helps in better planning and optimum resource utilization. Measurement of design properties such as coupling, cohesion, etc. in early phases of development often leads us to derive the corresponding maintainability with the help of prediction models. In this paper, we performed a systematic review of the existing studies related to software maintainability from January 1991 to October 2015. In total, 96 primary studies were identified out of which 47 studies were from journals, 36 from conference proceedings and 13 from others. All studies were compiled in structured form and analyzed through numerous perspectives such as the use of design metrics, prediction model, tools, data sources, prediction accuracy, etc. According to the review results, we found that the use of machine learning algorithms in predicting maintainability has increased since 2005. The use of evolutionary algorithms has also begun in related sub-fields since 2010. We have observed that design metrics is still the most favored option to capture the characteristics of any given software before deploying it further in prediction model for determining the corresponding software maintainability. A significant increase in the use of public dataset for making the prediction models has also been observed and in this regard two public datasets User Interface Management System (UIMS) and Quality Evaluation System (QUES) proposed by Li and Henry is quite popular among researchers. Although machine learning algorithms are still the most popular methods, however, we suggest that researchers working on software maintainability area should experiment on the use of open source datasets with hybrid algorithms. In this regard, more empirical studies are also required to be conducted on a large number of datasets so that a generalized theory could be made. The current paper will be beneficial for practitioners, researchers and developers as they can use these models and metrics for creating benchmark and standards. Findings of this extensive review would also be useful for novices in the field of software maintainability as it not only provides explicit definitions, but also lays a foundation for further research by providing a quick link to all important studies in the said field. Finally, this study also compiles current trends, emerging sub-fields and identifies various opportunities of future research in the field of software maintainability.

Sedimentary environment prediction of grain-size data based on machine learning approach

Interpretation ◽

10.1190/int-2019-0153.1 ◽

2020 ◽

Vol 8 (3) ◽

pp. SL71-SL78

Author(s):

Qiao Su ◽

Yanhui Zhu ◽

Fang Hu ◽

Xingyong Xu

Keyword(s):

Machine Learning ◽

Grain Size ◽

Prediction Model ◽

Sedimentary Environment ◽

Machine Learning Algorithms ◽

Grain Size Analysis ◽

Size Analysis ◽

Data Set ◽

Sedimentary Environments ◽

Size Data

Grain size is one of the most important records for sedimentary environment, and researchers have made remarkable progress in the interpretation of sedimentary environments by grain size analysis in the past few decades. However, these advances often depend on the personal experience of the scholars and combination with other methods used together. Here, we constructed a prediction model using the K-nearest neighbors algorithm, one of the machine learning methods, which can predict the sedimentary environments of one core through a known core. Compared to the results of other studies based on the comprehensive data set of grain size and four other indicators, this model achieved a high precision value only using the grain size data. We have also compared our prediction model with other mainstream machine learning algorithms, and the experimental results of six evaluation metrics shed light on that this prediction model can achieve the higher precision. The main errors of the model reflect the length of the conversation area of sedimentary environment, which is controlled by the sedimentary dynamics. This model can provide a quick comparison method of the cores in a similar environment; thus, it may point out the preliminary guidance for further study.

Vehicle Price Prediction using SVM Techniques

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.g5915.069820 ◽

2020 ◽

Vol 9 (8) ◽

pp. 398-401

Keyword(s):

Machine Learning ◽

Research Area ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Data Set ◽

Network Support ◽

Java Application ◽

Learning Techniques ◽

The Individual

The prediction of price for a vehicle has been more popular in research area, and it needs predominant effort and information about the experts of this particular field. The number of different attributes is measured and also it has been considerable to predict the result in more reliable and accurate. To find the price of used vehicles a well defined model has been developed with the help of three machine learning techniques such as Artificial Neural Network, Support Vector Machine and Random Forest. These techniques were used not on the individual items but for the whole group of data items. This data group has been taken from some web portal and that same has been used for the prediction. The data must be collected using web scraper that was written in PHP programming language. Distinct machine learning algorithms of varying performances had been compared to get the best result of the given data set. The final prediction model was integrated into Java application

Rotor Unbalance Kind and Severity Identification by Current Signature Analysis with Adaptative Update to Multiclass Machine Learning Algorithms

Studies in Engineering and Technology ◽

10.11114/set.v8i1.5213 ◽

2021 ◽

Vol 8 (1) ◽

pp. 28

Author(s):

S. L. Ávila ◽

H. M. Schaberle ◽

S. Youssef ◽

F. S. Pacheco ◽

C. A. Penz

Keyword(s):

Machine Learning ◽

Machine Learning Algorithms ◽

Training Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Signature Analysis ◽

Data Set ◽

Learning Techniques ◽

Environmental Variations ◽

Current Signature

The health of a rotating electric machine can be evaluated by monitoring electrical and mechanical parameters. As more information is available, it easier can become the diagnosis of the machine operational condition. We built a laboratory test bench to study rotor unbalance issues according to ISO standards. Using the electric stator current harmonic analysis, this paper presents a comparison study among Support-Vector Machines, Decision Tree classifies, and One-vs-One strategy to identify rotor unbalance kind and severity problem – a nonlinear multiclass task. Moreover, we propose a methodology to update the classifier for dealing better with changes produced by environmental variations and natural machinery usage. The adaptative update means to update the training data set with an amount of recent data, saving the entire original historical data. It is relevant for engineering maintenance. Our results show that the current signature analysis is appropriate to identify the type and severity of the rotor unbalance problem. Moreover, we show that machine learning techniques can be effective for an industrial application.

Big Data Analytics using Swarm Intelligence based Framework for Prediction on Datasets

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d5298.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 7356-7360

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Analytics ◽

Information Source ◽

Research Work ◽

Big Data Analytics ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Data Set ◽

Raw Data

Data Analytics is a scientific as well as an engineering tool used to investigate the raw data to revamp the information to achieve knowledge. This is normally connected with obtaining knowledge from reliable information source and rapidity in information processing, and future prediction of the data analysis. Big Data analytics is strongly evolving with different features of volume, velocity and Vectors. Most of the organizations are now concentrating on analyzing information or raw data that are fascinated in deploying analytics to survive forthcoming issues and challenges. The prediction model or intelligent model is proposed in this research to apply machine learning algorithms in the data set. Then it is interpreted and to analyze the better forecast value of the study. The major objective of this research work is to find the optimum prediction from the medical data set using the machine learning techniques.