Development of an Efficient Cement Production Monitoring System Based on the Improved Random Forest Algorithm

Mapping Intimacies ◽

10.21203/rs.3.rs-914830/v1 ◽

2021 ◽

Author(s):

Hanane Zermane ◽

Abbes Drardja

Keyword(s):

Machine Learning ◽

Random Forest ◽

Manufacturing Systems ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Production Performance ◽

Time Data ◽

Tree Model ◽

Global Improvement ◽

Line Production

Abstract Strengthening production plants and process control functions contribute to a global improvement of manufacturing systems because of their cross-functional characteristics in the industry. Companies established various innovative and operational strategies and there is increasing competitiveness among them and increase companies’ value. Machine Learning (ML) techniques have become an intelligent enticing option to address industrial issues in the current manufacturing sector since the emergence of Industry 4.0, and the extensive integration of paradigms such as big data, cloud computing, high computational power, and enormous storage capacity. Implementing a system that can identify faults early to avoid critical situations in the line production and environment is crucial. Therefore, one of the powerful machine learning algorithms is Random Forest (RF). The ensemble learning algorithm is performed to fault diagnosis and SCADA real-time data classification and predicting the state of the line production. Random Forests proved to be a better classifier with a 95% accuracy. Comparing to the SVM model, the accuracy is 94.18%, however, the K-NN model accuracy is about 93.83%, an accuracy of 80.25% is achieved using the logistic regression model, finally, about 83.73% is obtained by the decision tree model. The excellent experimental results achieved on the Random Forest model showed the merits of this implementation in the production performance, ensuring predictive maintenance, and avoid wasting energy.

Download Full-text

FLOOD MAPPING USING RANDOM FOREST AND IDENTIFYING THE ESSENTIAL CONDITIONING FACTORS; A CASE STUDY IN FREDERICTON, NEW BRUNSWICK, CANADA

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-v-3-2020-609-2020 ◽

2020 ◽

Vol V-3-2020 ◽

pp. 609-615 ◽

Cited By ~ 1

Author(s):

M. Esfandiari ◽

S. Jabari ◽

H. McGrath ◽

D. Coleman

Keyword(s):

Machine Learning ◽

Random Forest ◽

New Brunswick ◽

Urban Areas ◽

Learning Algorithm ◽

Satellite Image ◽

Machine Learning Algorithms ◽

Slope Aspect ◽

Flood Peak ◽

Conditioning Factors

Abstract. Flood is one of the most damaging natural hazards in urban areas in many places around the world as well as the city of Fredericton, New Brunswick, Canada. Recently, Fredericton has been flooded in two consecutive years in 2018 and 2019. Due to the complicated behaviour of water when a river overflows its bank, estimating the flood extent is challenging. The issue gets even more challenging when several different factors are affecting the water flow, like the land texture or the surface flatness, with varying degrees of intensity. Recently, machine learning algorithms and statistical methods are being used in many research studies for generating flood susceptibility maps using topographical, hydrological, and geological conditioning factors. One of the major issues that researchers have been facing is the complexity and the number of features required to input in a machine-learning algorithm to produce acceptable results. In this research, we used Random Forest to model the 2018 flood in Fredericton and analyzed the effect of several combinations of 12 different flood conditioning factors. The factors were tested against a Sentinel-2 optical satellite image available around the flood peak day. The highest accuracy was obtained using only 5 factors namely, altitude, slope, aspect, distance from the river, and land-use/cover with 97.57% overall accuracy and 95.14% kappa coefficient.

Download Full-text

Predicting Bank Operational Efficiency Using Machine Learning Algorithm: Comparative Study of Decision Tree, Random Forest, and Neural Networks

Advances in Fuzzy Systems ◽

10.1155/2020/8581202 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Peter Appiahene ◽

Yaw Marfo Missah ◽

Ussiph Najim

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Banking Sector ◽

Banking Industry ◽

Predictive Accuracy ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Machine Learning Algorithm ◽

And Performance

The financial crisis that hit Ghana from 2015 to 2018 has raised various issues with respect to the efficiency of banks and the safety of depositors’ in the banking industry. As part of measures to improve the banking sector and also restore customers’ confidence, efficiency and performance analysis in the banking industry has become a hot issue. This is because stakeholders have to detect the underlying causes of inefficiencies within the banking industry. Nonparametric methods such as Data Envelopment Analysis (DEA) have been suggested in the literature as a good measure of banks’ efficiency and performance. Machine learning algorithms have also been viewed as a good tool to estimate various nonparametric and nonlinear problems. This paper presents a combined DEA with three machine learning approaches in evaluating bank efficiency and performance using 444 Ghanaian bank branches, Decision Making Units (DMUs). The results were compared with the corresponding efficiency ratings obtained from the DEA. Finally, the prediction accuracies of the three machine learning algorithm models were compared. The results suggested that the decision tree (DT) and its C5.0 algorithm provided the best predictive model. It had 100% accuracy in predicting the 134 holdout sample dataset (30% banks) and a P value of 0.00. The DT was followed closely by random forest algorithm with a predictive accuracy of 98.5% and a P value of 0.00 and finally the neural network (86.6% accuracy) with a P value 0.66. The study concluded that banks in Ghana can use the result of this study to predict their respective efficiencies. All experiments were performed within a simulation environment and conducted in R studio using R codes.

Download Full-text

A Daily Covid-19 Cases Prediction System using Data Mining and Machine Learning Algorithm

10.5121/csit.2021.112320 ◽

2021 ◽

Author(s):

Yiqi Jack Gao ◽

Yu Sun

Keyword(s):

Machine Learning ◽

Random Forest ◽

Hospital Admissions ◽

Polynomial Regression ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Policy Makers ◽

Diverse Range ◽

Using Data

The start of 2020 marked the beginning of the deadly COVID-19 pandemic caused by the novel SARS-COV-2 from Wuhan, China. As of the time of writing, the virus had infected over 150 million people worldwide and resulted in more than 3.5 million global deaths. Accurate future predictions made through machine learning algorithms can be very useful as a guide for hospitals and policy makers to make adequate preparations and enact effective policies to combat the pandemic. This paper carries out a two pronged approach to analyzing COVID-19. First, the model utilizes the feature significance of random forest regressor to select eight of the most significant predictors (date, new tests, weekly hospital admissions, population density, total tests, total deaths, location, and total cases) for predicting daily increases of Covid-19 cases, highlighting potential target areas in order to achieve efficient pandemic responses. Then it utilizes machine learning algorithms such as linear regression, polynomial regression, and random forest regression to make accurate predictions of daily COVID-19 cases using a combination of this diverse range of predictors and proved to be competent at generating predictions with reasonable accuracy.

Download Full-text

Machine learning algorithms can predict tail biting outbreaks in pigs using feeding behaviour records

10.1101/2021.05.11.443554 ◽

2021 ◽

Author(s):

Catherine Ollagnier ◽

Claudia Kasper ◽

Anna Wallenbeck ◽

Linda Keeling ◽

Siavash A Bigdeli

Keyword(s):

Machine Learning ◽

Random Forest ◽

Real Time ◽

Feeding Behaviour ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Machine Learning Algorithm ◽

Tail Biting ◽

Testing Set

Tail biting is a detrimental behaviour that impacts the welfare and health of pigs. Early detection of tail biting precursor signs allows for preventive measures to be taken, thus avoiding the occurrence of the tail biting event. This study aimed to build a machine-learning algorithm for real time detection of upcoming tail biting outbreaks, using feeding behaviour data recorded by an electronic feeder. Prediction capacities of seven machine learning algorithms (e.g., random forest, neural networks) were evaluated from daily feeding data collected from 65 pens originating from 2 herds of grower-finisher pigs (25-100kg), in which 27 tail biting events occurred. Data were divided into training and testing data, either by randomly splitting data into 75% (training set) and 25% (testing set), or by randomly selecting pens to constitute the testing set. The random forest algorithm was able to predict 70% of the upcoming events with an accuracy of 94%, when predicting events in pens for which it had previous data. The detection of events for unknown pens was less sensitive, and the neural network model was able to detect 14% of the upcoming events with an accuracy of 63%. A machine-learning algorithm based on ongoing data collection should be considered for implementation into automatic feeder systems for real time prediction of tail biting events.

Download Full-text

Machine learning algorithm to predict delirium from emergency department data

10.1101/2021.02.19.21251956 ◽

2021 ◽

Author(s):

Sangil Lee ◽

Brianna Mueller ◽

W. Nick Street ◽

Ryan M. Carnahan

Keyword(s):

Machine Learning ◽

Emergency Department ◽

Logistic Regression ◽

Random Forest ◽

Barthel Index ◽

Risk Model ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Sensitivity Threshold ◽

Mortality And Morbidity

AbstractIntroductionDelirium is a cerebral dysfunction seen commonly in the acute care setting. Delirium is associated with increased mortality and morbidity and is frequently missed in the emergency department (ED) by clinical gestalt alone. Identifying those at risk of delirium may help prioritize screening and interventions.ObjectiveOur objective was to identify clinically valuable predictive models for prevalent delirium within the first 24 hours of hospitalization based on the available data by assessing the performance of logistic regression and a variety of machine learning models.MethodsThis was a retrospective cohort study to develop and validate a predictive risk model to detect delirium using patient data obtained around an ED encounter. Data from electronic health records for patients hospitalized from the ED between January 1, 2014, and December 31, 2019, were extracted. Eligible patients were aged 65 or older, admitted to an inpatient unit from the emergency department, and had at least one DOSS assessment or CAM-ICU recorded while hospitalized. The outcome measure of this study was delirium within one day of hospitalization determined by a positive DOSS or CAM assessment. We developed the model with and without the Barthel index for activity of daily living, since this was measured after hospital admission.ResultsThe area under the ROC curves for delirium ranged from .69 to .77 without the Barthel index. Random forest and gradient-boosted machine showed the highest AUC of .77. At the 90% sensitivity threshold, gradient-boosted machine, random forest, and logistic regression achieved a specificity of 35%. After the Barthel index was included, random forest, gradient-boosted machine, and logistic regression models demonstrated the best predictive ability with respective AUCs of .85 to .86.ConclusionThis study demonstrated the use of machine learning algorithms to identify the combination of variables that are predictive of delirium within 24 hours of hospitalization from the ED.

Download Full-text

Deducing of Optimal Machine Learning Algorithms for Heterogeneity

10.36227/techrxiv.17162147 ◽

2021 ◽

Author(s):

Omar Alfarisi ◽

Zeyar Aung ◽

Mohamed Sassi

Keyword(s):

Machine Learning ◽

Random Forest ◽

Learning Algorithm ◽

Learning Algorithms ◽

Synthetic Data ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Machine Learning Algorithm ◽

Data Set ◽

Optimal Machine

For defining the optimal machine learning algorithm, the decision was not easy for which we shall choose. To help future researchers, we describe in this paper the optimal among the best of the algorithms. We built a synthetic data set and performed the supervised machine learning runs for five different algorithms. For heterogeneity, we identified Random Forest, among others, to be the best algorithm.

Download Full-text

Amino Acid Composition and Charge Based Prediction of Antisepsis Peptides by Random Forest Machine Learning Algorithm

10.1101/2021.09.26.461860 ◽

2021 ◽

Author(s):

Aayushi Rathore ◽

Anu Saini ◽

Navjot Kaur ◽

Aparna Singh ◽

Ojasvi Dutta ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Learning Algorithm ◽

Learning Algorithms ◽

Multiple Organ ◽

The Body ◽

Machine Learning Algorithms ◽

Support Vector ◽

Data Set ◽

Organ Systems

ABSTRACTSepsis is a severe infectious disease with high mortality, and it occurs when chemicals released in the bloodstream to fight an infection trigger inflammation throughout the body and it can cause a cascade of changes that damage multiple organ systems, leading them to fail, even resulting in death. In order to reduce the possibility of sepsis or infection antiseptics are used and process is known as antisepsis. Antiseptic peptides (ASPs) show properties similar to antigram-negative peptides, antigram-positive peptides and many more. Machine learning algorithms are useful in screening and identification of therapeutic peptides and thus provide initial filters or built confidence before using time consuming and laborious experimental approaches. In this study, various machine learning algorithms like Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbour (KNN) and Logistic Regression (LR) were evaluated for prediction of ASPs. Moreover, the characteristics physicochemical features of ASPs were also explored to use them in machine learning. Both manual and automatic feature selection methodology was employed to achieve best performance of machine learning algorithms. A 5-fold cross validation and independent data set validation proved RF as the best model for prediction of ASPs. Our RF model showed an accuracy of 97%, Matthew’s Correlation Coefficient (MCC) of 0.93, which are indication of a robust and good model. To our knowledge this is the first attempt to build a machine learning classifier for prediction of ASPs.

Download Full-text

Predicting Limit-Setting Behavior of Gamblers Using Machine Learning Algorithms: A Real-World Study of Norwegian Gamblers Using Account Data

International Journal of Mental Health and Addiction ◽

10.1007/s11469-019-00166-2 ◽

2019 ◽

Cited By ~ 1

Author(s):

Michael Auer ◽

Mark D. Griffiths

Keyword(s):

Machine Learning ◽

Random Forest ◽

Test Data ◽

Predictive Analytics ◽

Learning Algorithm ◽

Responsible Gambling ◽

Machine Learning Algorithms ◽

Training Data ◽

Training Dataset ◽

Limit Setting

AbstractPlayer protection and harm minimization have become increasingly important in the gambling industry along with the promotion of responsible gambling (RG). Among the most widespread RG tools that gaming operators provide are limit-setting tools that help players limit the amount of time and/or money they spend gambling. Research suggests that limit-setting significantly reduces the amount of money that players spend. If limit-setting is to be encouraged as a way of facilitating responsible gambling, it is important to know what variables are important in getting individuals to set and change limits in the first place. In the present study, 33 variables assessing the player behavior among Norsk Tipping clientele (N = 70,789) from January to March 2017 were computed. The 33 variables which reflect the players’ behavior were then used to predict the likelihood of gamblers changing their monetary limit between April and June 2017. The 70,789 players were randomly split into a training dataset of 56,532 and an evaluation set of 14,157 players (corresponding to an 80/20 split). The results demonstrated that it is possible to predict future limit-setting based on player behavior. The random forest algorithm appeared to predict limit-changing behavior much better than the other algorithms. However, on the independent test data, the random forest algorithm’s accuracy dropped significantly. The best performance on the test data along with a small decrease in accuracy in comparison to the training data was delivered by the gradient boost machine learning algorithm. The most important variables predicting future limit-setting using the gradient boost machine algorithm were players receiving feedback that they had reached 80% of their personal monthly global loss limit, personal monthly loss limit, the amount bet, theoretical loss, and whether the players had increased their limits in the past. With the help of predictive analytics, players with a high likelihood of changing their limits can be proactively approached.

Download Full-text

Spatial Simulation Modeling of Settlement Distribution Driven by Random Forest: Consideration of Landscape Visibility

Sustainability ◽

10.3390/su12114748 ◽

2020 ◽

Vol 12 (11) ◽

pp. 4748

Author(s):

Minrui Zheng ◽

Wenwu Tang ◽

Akinwumi Ogundiran ◽

Jianxin Yang

Keyword(s):

Machine Learning ◽

Random Forest ◽

Simulation Modeling ◽

Learning Algorithm ◽

Learning Algorithms ◽

Simulation Models ◽

Machine Learning Algorithms ◽

Spatial Simulation ◽

Study Region ◽

The Relationship

Settlement models help to understand the social–ecological functioning of landscape and associated land use and land cover change. One of the issues of settlement modeling is that models are typically used to explore the relationship between settlement locations and associated influential factors (e.g., slope and aspect). However, few studies in settlement modeling adopted landscape visibility analysis. Landscape visibility provides useful information for understanding human decision-making associated with the establishment of settlements. In the past years, machine learning algorithms have demonstrated their capabilities in improving the performance of the settlement modeling and particularly capturing the nonlinear relationship between settlement locations and their drivers. However, simulation models using machine learning algorithms in settlement modeling are still not well studied. Moreover, overfitting issues and optimization of model parameters are major challenges for most machine learning algorithms. Therefore, in this study, we sought to pursue two research objectives. First, we aimed to evaluate the contribution of viewsheds and landscape visibility to the simulation modeling of - settlement locations. The second objective is to examine the performance of the machine learning algorithm-based simulation models for settlement location studies. Our study region is located in the metropolitan area of Oyo Empire, Nigeria, West Africa, ca. AD 1570–1830, and its pre-Imperial antecedents, ca. AD 1360–1570. We developed an event-driven spatial simulation model enabled by random forest algorithm to represent dynamics in settlement systems in our study region. Experimental results demonstrate that viewsheds and landscape visibility may offer more insights into unveiling the underlying mechanism that drives settlement locations. Random forest algorithm, as a machine learning algorithm, provide solid support for establishing the relationship between settlement occurrences and their drivers.

Download Full-text

Clinical Features at Presentation for Glioblastoma Patients Impact Survival Predictions in a Machine Learning Model

Neuro-Oncology ◽

10.1093/neuonc/noab195.046 ◽

2021 ◽

Vol 23 (Supplement_4) ◽

pp. iv18-iv18

Author(s):

Alistair Lawrence ◽

Rohit Sinha ◽

Stefan Mitrasinovic ◽

Stephen Price

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Machine Learning Algorithms ◽

Analysis Model ◽

Tree Model ◽

Machine Learning Model ◽

Validation Set ◽

Learning Analysis ◽

Prognostic Modelling

Abstract Aims To generate an accurate prediction model for greater than median survival using Random Forest machine learning analysis and to compare the model to a traditional logistic regression analysis model on the same Glioblastoma Dataset. Method In this single centre retrospective cohort study, all patients with histologically diagnosed primary GB from October 2014 to April 2019 were included (n=466). Machine learning algorithms encompassing multiple logistic regression and a Random Forest, Gini index-based decision tree model with 100,000 trees were used. 17 clinical, molecular and treatment specific binarily categorised variables were used. The dataset was split 70:30 into training and validating sets. Results The dataset contained 466 patients. 326 patients made up the training set and 140 the validation set. The Random Forest model’s accuracy for predicting 18-month survival was 86.4% compared to the Logistic Regression model’s accuracy of 85.7%. The top 5 factors that the Random Forest model used to predict survival over 18 months were; mean MGMT status >10%, if the patient underwent gross total resection, whether the patient had adjuvant temozolomide, whether the patient had a neurological deficit on presentation, and the sex of the patient. Conclusion Machine learning can be applied in the context of GB prognostic modelling. The models show that as well as the known factors that affect GB survival, the presenting symptom may also have an impact on prognostication.

Download Full-text