Random Forest Regression-Based Machine Learning Model for Accurate Estimation of Fluid Flow in Curved Pipes

In industrial piping systems, turbomachinery, heat exchangers etc., pipe bends are essential components. Computational fluid dynamics (CFD), which is frequently used to analyse the flow behaviour in such systems, provides extremely precise estimates but is computationally expensive. As a result, a computationally efficient method is developed in this paper by leveraging machine learning for such computationally expensive CFD problems. Random forest regression (RFR) is used as the machine learning algorithm in this work. Four different fluid flow characteristics (i.e., axial velocity, x-velocity, y-velocity and z-velocity) are studied in this work. The accuracy of the RFR models is assessed by using a number of statistical metrics such as mean-absolute error (MAE), mean-squared-error (MSE), root-mean-squared-error (RMSE), maximum error (Max.Error) and median error (Med.Error) etc. It is observed that the RFR models can produce considerable cost reductions in computing by surrogating the CFD model. Minor loss in estimation accuracy as compared to the CFD models is observed. While the magnitude of intricate flow characteristics such as the additional vortices are correctly predicted, some error in their location is observed.

Download Full-text

Predicting Future Products Rate using Machine Learning Algorithms

International Journal of Intelligent Systems and Applications ◽

10.5815/ijisa.2020.05.04 ◽

2020 ◽

Vol 12 (5) ◽

pp. 41-51

Author(s):

Shaimaa Mahmoud ◽

◽

Mahmoud Hussein ◽

Arabi Keshk

Keyword(s):

Machine Learning ◽

Random Forest ◽

Linear Regression ◽

Mean Squared Error ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Random Forest Regression ◽

Data Set ◽

Squared Error

Opinion mining in social networks data is considered as one of most important research areas because a large number of users interact with different topics on it. This paper discusses the problem of predicting future products rate according to users’ comments. Researchers interacted with this problem by using machine learning algorithms (e.g. Logistic Regression, Random Forest Regression, Support Vector Regression, Simple Linear Regression, Multiple Linear Regression, Polynomial Regression and Decision Tree). However, the accuracy of these techniques still needs to be improved. In this study, we introduce an approach for predicting future products rate using LR, RFR, and SVR. Our data set consists of tweets and its rate from 1:5. The main goal of our approach is improving the prediction accuracy about existing techniques. SVR can predict future product rate with a Mean Squared Error (MSE) of 0.4122, Linear Regression model predict with a Mean Squared Error of 0.4986 and Random Forest Regression can predict with a Mean Squared Error of 0.4770. This is better than the existing approaches accuracy.

Download Full-text

Comparison of different Artificial Intelligence techniques to predict Diabetic Kidney Disease (Preprint)

10.2196/preprints.22636 ◽

2020 ◽

Author(s):

Satish Kumar ◽

Mohamed Rafiullah ◽

Khalid Siddiqui

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

High Risk ◽

Random Forest ◽

Diabetic Kidney Disease ◽

Mean Squared Error ◽

Confusion Matrix ◽

Classification Techniques ◽

Squared Error ◽

Classification Technique

BACKGROUND Diabetic kidney disease (DKD) is a progressive disease that leads to loss of kidney function. As early intervention improves patient outcomes, it is essential to identify the patients who are at high risk of developing DKD. Artificial Intelligence methods apply different machine learning classification techniques to identify high-risk patients by building a predictive model from a given dataset. OBJECTIVE This study aims to find an accurate classification technique for predicting DKD by comparing different classification techniques applied to a DKD dataset using WEKA machine learning software. METHODS We analyzed the performance of nine different classification techniques on a DKD dataset with 410 instances and 18 attributes. 66% of the dataset was used to build a model, and 33% of the data was used for evaluating the model. The performance of classification techniques were assessed based on their execution time, accuracy, correctly and incorrectly classified instances, kappa statistics (K), mean absolute error, root mean squared error and true values of the confusion matrix. RESULTS Random Forest classifier was found to be the best performing technique with an accuracy of 76.5854% and a higher K value (0.5306) in comparison to other classifiers. Besides, it also showed the lowest root mean squared error rate (0.4007). From the confusion matrix, it was found that there were 46 false-positive instances and 50 false-negative instances from the Random Forest technique. CONCLUSIONS This study identified the Random Forest classification technique as the best performing classifier and accurate prediction method for DKD. CLINICALTRIAL NA

Download Full-text

Predictors of Newborn’s Weight for Height: A Machine Learning Study Using Nationwide Multicenter Ultrasound Data

Diagnostics ◽

10.3390/diagnostics11071280 ◽

2021 ◽

Vol 11 (7) ◽

pp. 1280

Author(s):

Ki Ahn ◽

Kwang-Sig Lee ◽

Se Lee ◽

Sung Kwon ◽

Sunghun Na ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Performance Measures ◽

Gestational Age ◽

Mean Squared Error ◽

Variable Importance ◽

Learning Study ◽

Delivery Time ◽

Fetal Biometry ◽

Squared Error

There has been no machine learning study with a rich collection of clinical, sonographic markers to compare the performance measures for a variety of newborns’ weight-for-height indicators. This study compared the performance measures for a variety of newborns’ weight-for-height indicators based on machine learning, ultrasonographic data and maternal/delivery information. The source of data for this study was a multi-center retrospective study with 2949 mother–newborn pairs. The mean-squared-error-over-variance measures of five machine learning approaches were compared for newborn’s weight, newborn’s weight/height, newborn’s weight/height2 and newborn’s weight/hieght3. Random forest variable importance, the influence of a variable over average node impurity, was used to identify major predictors of these newborns’ weight-for-height indicators among ultrasonographic data and maternal/delivery information. Regarding ultrasonographic fetal biometry, newborn’s weight, newborn’s weight/height and newborn’s weight/height2 were better indicators with smaller mean-squared-error-over-variance measures than newborn’s weight/height3. Based on random forest variable importance, the top six predictors of newborn’s weight were the same as those of newborn’s weight/height and those of newborn’s weight/height2: gestational age at delivery time, the first estimated fetal weight and abdominal circumference in week 36 or later, maternal weight and body mass index at delivery time, and the first biparietal diameter in week 36 or later. These six predictors also ranked within the top seven for large-for-gestational-age and the top eight for small-for-gestational-age. In conclusion, newborn’s weight, newborn’s weight/height and newborn’s weight/height2 are more suitable for ultrasonographic fetal biometry with smaller mean-squared-error-over-variance measures than newborn’s weight/height3. Machine learning with ultrasonographic data would be an effective noninvasive approach for predicting newborn’s weight, weight/height and weight/height2.

Download Full-text

Um sistema baseado em machine learning para apoio à decisão no gerenciamento de produção apícola

REVISTA BRASILEIRA DE AGROTECNOLOGIA ◽

10.18378/rebagro.v11i1.8679 ◽

2021 ◽

Vol 11 (1) ◽

pp. 08-19

Author(s):

Weskley Damasceno Silva ◽

Silas Santiago Lopes Pereira ◽

Daniel Santiago Pereira ◽

Michell Olívio Xavier da Costa

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Support Vector Regression ◽

Multilayer Perceptron ◽

Mean Squared Error ◽

Support Vector ◽

Root Mean Squared Error ◽

Squared Error

O setor apícola tem ganhado grandes proporções nos últimos tempos em termos de produção e comercialização de produtos, como o mel e seus derivados. O Brasil, apesar de ter acompanhado esse crescimento e possuir boas características para o desenvolvimento da apicultura, ainda sofre com a limitação no uso de ferramentas tecnológicas, o que afeta diretamente os níveis de produção. Este artigo propõe o desenvolvimento de uma ferramenta tecnológica que auxilie o apicultor no gerenciamento eficiente da produção apícola e na tomada de decisão a partir de modelos preditivos baseados em Machine Learning (ML) e integrados a um sistema web. Para tanto, foram utilizados diferentes algoritmos de ML para predição de produção de mel, tais como a Regressão Linear Múltipla, Decision Tree, Random Forest, Multilayer Perceptron (MLP) e Support Vector Regression (SVR). Os modelos gerados foram avaliados com base no coeficiente de determinação (R2 ou Score) e o cálculo de erro das predições utilizando a Root Mean Squared Error (RMSE). Os resultados desta pesquisa contam com um sistema web em desenvolvimento e resultados dos experimentos realizados, que mostram uma melhor performance da técnica MLP com Score de 0.98 e RMSE de 711196 libras.

Download Full-text

Vorhersage der Fließgewässertemperaturen in österreichischen Einzugsgebieten mittels Machine Learning-Verfahren

Österreichische Wasser- und Abfallwirtschaft ◽

10.1007/s00506-021-00771-3 ◽

2021 ◽

Author(s):

Moritz Feigl ◽

Katharina Lebiedzinski ◽

Mathew Herrnegger ◽

Karsten Schulz

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Random Forest ◽

Recurrent Neural Networks ◽

Mean Squared Error ◽

Feedforward Neural Networks ◽

Gradient Boosting ◽

Squared Error ◽

Extreme Gradient Boosting ◽

Lineare Regression

ZusammenfassungDie Fließgewässertemperatur ist ein essenzieller Umweltfaktor, der das Potenzial hat, sowohl ökologische als auch sozio-ökonomische Rahmenbedingungen im Umfeld eines Gewässers zu verändern. Um Fließgewässertemperaturen als Grundlage für effektive Anpassungsstrategien für zukünftige Veränderungen (z. B. durch den Klimawandel) berechnen zu können, sind adäquate Modellierungskonzepte notwendig. Die vorliegende Studie untersucht hierfür 6 Machine Learning-Modelle: Schrittweise Lineare Regression, Random Forest, eXtreme Gradient Boosting, Feedforward Neural Networks und zwei Arten von Recurrent Neural Networks. Die Modelle wurden an 10 österreichischen Einzugsgebieten mit unterschiedlichen physiographischen Eigenschaften und Eingangsdatenkombinationen getestet. Die Hyperparameter der angewandten Modelle wurden mittels Bayes’scher Hyperparameteroptimierung optimiert. Um die Ergebnisse mit anderen Studien vergleichbar zu machen, wurden die Vorhersagen der 6 Machine Learning-Modelle den Ergebnissen der linearen Regression und dem häufig verwendeten und bekannten Wassertemperaturmodell air2stream gegenübergestellt.Von den 6 getesteten Modellen zeigten die Feedforward Neural Networks und das eXtreme Gradient Boosting die besten Vorhersagen in jeweils 4 von 10 Einzugsgebieten. Mit einem durchschnittlichen RMSE (Wurzel der mittleren Fehlerquadratsumme; root mean squared error) von 0,55 °C konnten die getesteten Modelle die Fließgewässertemperaturen deutlich besser prognostizieren als die lineare Regression (1,55 °C) und air2stream (0,98 °C). Generell zeigten die Ergebnisse der 6 Modelle eine sehr vergleichbare Leistung mit lediglich einer mittleren Abweichung um den Medianwert von 0,08 °C zwischen den einzelnen Modellen. Im größten untersuchten Einzugsgebiet – Donau bei Kienstock – wiesen Recurrent Neural Networks die höchste Modellgüte auf, was darauf hinweist, dass sie sich am besten eignen, wenn im Einzugsgebiet Prozesse mit langfristigen Abhängigkeiten ausschlaggebend sind. Die Wahl der Hyperparameter beeinflusste die Vorhersagefähigkeit der Modelle stark, was die Bedeutung der Hyperparameteroptimierung besonders hervorhebt.Die Ergebnisse dieser Studie fassen die Bedeutung unterschiedlicher Eingangsdaten, Modelle und Trainingscharakteristiken für die Modellierung von mittleren täglichen Fließgewässertemperaturen zusammen. Gleichzeitig dient diese Studie als Basis für die Entwicklung zukünftiger Modelle für eine regionale Fließgewässertemperaturvorhersage. Die getesteten Modelle stehen im open source R‑Paket wateRtemp allen AnwenderInnen der Forschungsgemeinschaft und der Praxis zur Verfügung.

Download Full-text

A novel framework for designing a multi-DoF prosthetic wrist control using machine learning

Scientific Reports ◽

10.1038/s41598-021-94449-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Chinmay P. Swami ◽

Nicholas Lenhard ◽

Jiyeon Kang

Keyword(s):

Machine Learning ◽

Random Forest ◽

Upper Limb ◽

Daily Living ◽

Machine Learning Algorithms ◽

Data Sets ◽

Random Forest Regression ◽

Prosthetic Devices ◽

Upper Limb Function ◽

The Neural Network

AbstractProsthetic arms can significantly increase the upper limb function of individuals with upper limb loss, however despite the development of various multi-DoF prosthetic arms the rate of prosthesis abandonment is still high. One of the major challenges is to design a multi-DoF controller that has high precision, robustness, and intuitiveness for daily use. The present study demonstrates a novel framework for developing a controller leveraging machine learning algorithms and movement synergies to implement natural control of a 2-DoF prosthetic wrist for activities of daily living (ADL). The data was collected during ADL tasks of ten individuals with a wrist brace emulating the absence of wrist function. Using this data, the neural network classifies the movement and then random forest regression computes the desired velocity of the prosthetic wrist. The models were trained/tested with ADLs where their robustness was tested using cross-validation and holdout data sets. The proposed framework demonstrated high accuracy (F-1 score of 99% for the classifier and Pearson’s correlation of 0.98 for the regression). Additionally, the interpretable nature of random forest regression was used to verify the targeted movement synergies. The present work provides a novel and effective framework to develop an intuitive control for multi-DoF prosthetic devices.

Download Full-text

Experimental Evaluation of Computer Vision and Machine Learning-Based UAV Detection and Ranging

Drones ◽

10.3390/drones5020037 ◽

2021 ◽

Vol 5 (2) ◽

pp. 37

Author(s):

Bingsheng Wei ◽

Martin Barczyk

Keyword(s):

Machine Learning ◽

Mean Squared Error ◽

Tracking System ◽

Ground Truth ◽

White Background ◽

Cascade Classifier ◽

Detection Algorithms ◽

Squared Error ◽

Test Conditions ◽

Video Feed

We consider the problem of vision-based detection and ranging of a target UAV using the video feed from a monocular camera onboard a pursuer UAV. Our previously published work in this area employed a cascade classifier algorithm to locate the target UAV, which was found to perform poorly in complex background scenes. We thus study the replacement of the cascade classifier algorithm with newer machine learning-based object detection algorithms. Five candidate algorithms are implemented and quantitatively tested in terms of their efficiency (measured as frames per second processing rate), accuracy (measured as the root mean squared error between ground truth and detected location), and consistency (measured as mean average precision) in a variety of flight patterns, backgrounds, and test conditions. Assigning relative weights of 20%, 40% and 40% to these three criteria, we find that when flying over a white background, the top three performers are YOLO v2 (76.73 out of 100), Faster RCNN v2 (63.65 out of 100), and Tiny YOLO (59.50 out of 100), while over a realistic background, the top three performers are Faster RCNN v2 (54.35 out of 100, SSD MobileNet v1 (51.68 out of 100) and SSD Inception v2 (50.72 out of 100), leading us to recommend Faster RCNN v2 as the recommended solution. We then provide a roadmap for further work in integrating the object detector into our vision-based UAV tracking system.

Download Full-text

Application of random forest regression to the calculation of gas-phase chemistry within the GEOS-Chem chemistry model v10

Geoscientific Model Development ◽

10.5194/gmd-12-1209-2019 ◽

2019 ◽

Vol 12 (3) ◽

pp. 1209-1225 ◽

Cited By ~ 15

Author(s):

Christoph A. Keller ◽

Mat J. Evans

Keyword(s):

Machine Learning ◽

Random Forest ◽

Gas Phase ◽

Atmospheric Chemistry ◽

Random Forest Regression ◽

Data Set ◽

Gas Phase Chemistry ◽

Chemical Conditions ◽

Phase Chemistry ◽

The Impact

Abstract. Atmospheric chemistry models are a central tool to study the impact of chemical constituents on the environment, vegetation and human health. These models are numerically intense, and previous attempts to reduce the numerical cost of chemistry solvers have not delivered transformative change. We show here the potential of a machine learning (in this case random forest regression) replacement for the gas-phase chemistry in atmospheric chemistry transport models. Our training data consist of 1 month (July 2013) of output of chemical conditions together with the model physical state, produced from the GEOS-Chem chemistry model v10. From this data set we train random forest regression models to predict the concentration of each transported species after the integrator, based on the physical and chemical conditions before the integrator. The choice of prediction type has a strong impact on the skill of the regression model. We find best results from predicting the change in concentration for long-lived species and the absolute concentration for short-lived species. We also find improvements from a simple implementation of chemical families (NOx = NO + NO2). We then implement the trained random forest predictors back into GEOS-Chem to replace the numerical integrator. The machine-learning-driven GEOS-Chem model compares well to the standard simulation. For ozone (O3), errors from using the random forests (compared to the reference simulation) grow slowly and after 5 days the normalized mean bias (NMB), root mean square error (RMSE) and R2 are 4.2 %, 35 % and 0.9, respectively; after 30 days the errors increase to 13 %, 67 % and 0.75, respectively. The biases become largest in remote areas such as the tropical Pacific where errors in the chemistry can accumulate with little balancing influence from emissions or deposition. Over polluted regions the model error is less than 10 % and has significant fidelity in following the time series of the full model. Modelled NOx shows similar features, with the most significant errors occurring in remote locations far from recent emissions. For other species such as inorganic bromine species and short-lived nitrogen species, errors become large, with NMB, RMSE and R2 reaching >2100 % >400 % and <0.1, respectively. This proof-of-concept implementation takes 1.8 times more time than the direct integration of the differential equations, but optimization and software engineering should allow substantial increases in speed. We discuss potential improvements in the implementation, some of its advantages from both a software and hardware perspective, its limitations, and its applicability to operational air quality activities.

Download Full-text

A new generic method to improve machine learning applications in official statistics

Statistical Journal of the IAOS ◽

10.3233/sji-210885 ◽

2021 ◽

pp. 1-16

Author(s):

Kevin Kloos

Keyword(s):

Machine Learning ◽

Mean Squared Error ◽

Statistical Properties ◽

Machine Learning Algorithms ◽

Official Statistics ◽

Academic Literature ◽

Misclassification Bias ◽

Squared Error ◽

Machine Learning Applications ◽

Applications Of Machine Learning

The use of machine learning algorithms at national statistical institutes has increased significantly over the past few years. Applications range from new imputation schemes to new statistical output based entirely on machine learning. The results are promising, but recent studies have shown that the use of machine learning in official statistics always introduces a bias, known as misclassification bias. Misclassification bias does not occur in traditional applications of machine learning and therefore it has received little attention in the academic literature. In earlier work, we have collected existing methods that are able to correct misclassification bias. We have compared their statistical properties, including bias, variance and mean squared error. In this paper, we present a new generic method to correct misclassification bias for time series and we derive its statistical properties. Moreover, we show numerically that it has a lower mean squared error than the existing alternatives in a wide variety of settings. We believe that our new method may improve machine learning applications in official statistics and we aspire that our work will stimulate further methodological research in this area.

Download Full-text