scholarly journals Drill bit deterioration estimation with the Random Forest Regressor

2021 ◽  
Vol 942 (1) ◽  
pp. 012013
Author(s):  
Mateusz Góralczyk ◽  
Anna Michalak ◽  
Paweł Śliwiński

Abstract Blastholes drilling performance is crucial for ensuring good performance of the whole excavation process, the correctness of which demands ‘healthy’ drill bit and appropriate behavior of an operator. Given the large volume of non-linear parameters describing the process, it appears reasonable to employ supervised learning methods to obtain drilling performance insights. Random Forest Regressor model has been trained on the dataset corresponding to correct performance of blastholes drilling and its hyperparameters have been tuned to obtain the highest possible accuracy. It has been later tested on three datasets corresponding to a good performance of drilling, and two cases of its non-optimal execution. Estimation errors are proposed to be used as bit technical state condition indicators (or more generally - process performance indicators). Root Mean Squared Error has been proven to differ significantly when compared estimation based on datasets corresponding to execution of drilling with ‘healthy’ drill bit, and its execution with worn-off one, however, it has been not sufficient to distinguish non-optimal drilling when additional feed pressure has been exerted by an operator to compensate the reduced pace of drilling. It has been, however, possible when the mean of absolute estimation errors has been used.

2020 ◽  
Vol 11 (1) ◽  
pp. 44
Author(s):  
Rahmat Robi Waliyansyah ◽  
Nugroho Dwi Saputro

College education institutions regularly hold new student admissions activities, and the number of new students can increase and can also decrease. University of PGRI Semarang (UPGRIS) on the development of new student admissions for the 2014/2015 academic year up to 2018/2019 with so many admissions selection stages. To meet the minimum comparison requirements between the number of students with the development of human resources, facilities, and infrastructure, it is necessary to predict how much the number of students increases each year. To make a prediction system or forecasting, the number of prospective new students required a good forecasting method and sufficiently precise calculations to predict the number of prospective students who register. In this study, the method to be taken is the Random Forest method. For the evaluation of forecasting models used Random Sampling and Cross-validation. The parameter used is Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Coefficient of Determination (R2). The results of this study obtained the five highest and lowest study programs in the admission of new students. Therefore, UPGRIS will make a new strategy for the five lowest study programs so that the desired number of new students is achieved


Processes ◽  
2021 ◽  
Vol 9 (11) ◽  
pp. 2095
Author(s):  
Ganesh N. ◽  
Paras Jain ◽  
Amitava Choudhury ◽  
Prasun Dutta ◽  
Kanak Kalita ◽  
...  

In industrial piping systems, turbomachinery, heat exchangers etc., pipe bends are essential components. Computational fluid dynamics (CFD), which is frequently used to analyse the flow behaviour in such systems, provides extremely precise estimates but is computationally expensive. As a result, a computationally efficient method is developed in this paper by leveraging machine learning for such computationally expensive CFD problems. Random forest regression (RFR) is used as the machine learning algorithm in this work. Four different fluid flow characteristics (i.e., axial velocity, x-velocity, y-velocity and z-velocity) are studied in this work. The accuracy of the RFR models is assessed by using a number of statistical metrics such as mean-absolute error (MAE), mean-squared-error (MSE), root-mean-squared-error (RMSE), maximum error (Max.Error) and median error (Med.Error) etc. It is observed that the RFR models can produce considerable cost reductions in computing by surrogating the CFD model. Minor loss in estimation accuracy as compared to the CFD models is observed. While the magnitude of intricate flow characteristics such as the additional vortices are correctly predicted, some error in their location is observed.


2020 ◽  
Vol 12 (5) ◽  
pp. 41-51
Author(s):  
Shaimaa Mahmoud ◽  
◽  
Mahmoud Hussein ◽  
Arabi Keshk

Opinion mining in social networks data is considered as one of most important research areas because a large number of users interact with different topics on it. This paper discusses the problem of predicting future products rate according to users’ comments. Researchers interacted with this problem by using machine learning algorithms (e.g. Logistic Regression, Random Forest Regression, Support Vector Regression, Simple Linear Regression, Multiple Linear Regression, Polynomial Regression and Decision Tree). However, the accuracy of these techniques still needs to be improved. In this study, we introduce an approach for predicting future products rate using LR, RFR, and SVR. Our data set consists of tweets and its rate from 1:5. The main goal of our approach is improving the prediction accuracy about existing techniques. SVR can predict future product rate with a Mean Squared Error (MSE) of 0.4122, Linear Regression model predict with a Mean Squared Error of 0.4986 and Random Forest Regression can predict with a Mean Squared Error of 0.4770. This is better than the existing approaches accuracy.


2020 ◽  
Author(s):  
Satish Kumar ◽  
Mohamed Rafiullah ◽  
Khalid Siddiqui

BACKGROUND Diabetic kidney disease (DKD) is a progressive disease that leads to loss of kidney function. As early intervention improves patient outcomes, it is essential to identify the patients who are at high risk of developing DKD. Artificial Intelligence methods apply different machine learning classification techniques to identify high-risk patients by building a predictive model from a given dataset. OBJECTIVE This study aims to find an accurate classification technique for predicting DKD by comparing different classification techniques applied to a DKD dataset using WEKA machine learning software. METHODS We analyzed the performance of nine different classification techniques on a DKD dataset with 410 instances and 18 attributes. 66% of the dataset was used to build a model, and 33% of the data was used for evaluating the model. The performance of classification techniques were assessed based on their execution time, accuracy, correctly and incorrectly classified instances, kappa statistics (K), mean absolute error, root mean squared error and true values of the confusion matrix. RESULTS Random Forest classifier was found to be the best performing technique with an accuracy of 76.5854% and a higher K value (0.5306) in comparison to other classifiers. Besides, it also showed the lowest root mean squared error rate (0.4007). From the confusion matrix, it was found that there were 46 false-positive instances and 50 false-negative instances from the Random Forest technique. CONCLUSIONS This study identified the Random Forest classification technique as the best performing classifier and accurate prediction method for DKD. CLINICALTRIAL NA


2014 ◽  
Vol 543-547 ◽  
pp. 1655-1658
Author(s):  
Xiang Ran Du ◽  
Hai Tao Liu ◽  
Min Zhang

In this paper, we compare the estimation performances of 7 different kernels (i.e., Uniform, Triangular, Epanechnikov, Biweight, Triweight, Cosine and Gaussian) when using them to conduct the probability density estimation with Parzen window method. We firstly analyze the efficiencies of these 7 kernels and then compare their estimation errors measured by mean squared error (MSE). The theoretical analysis and the experimental comparisons show that the mostly-used Gaussian kernel is not the best choice for the probability density estimation, of which the efficiency is low and estimation error is high. The derived conclusions give some guidelines for the selection of kernel in the practical application of probability density estimation.


Diagnostics ◽  
2021 ◽  
Vol 11 (7) ◽  
pp. 1280
Author(s):  
Ki Ahn ◽  
Kwang-Sig Lee ◽  
Se Lee ◽  
Sung Kwon ◽  
Sunghun Na ◽  
...  

There has been no machine learning study with a rich collection of clinical, sonographic markers to compare the performance measures for a variety of newborns’ weight-for-height indicators. This study compared the performance measures for a variety of newborns’ weight-for-height indicators based on machine learning, ultrasonographic data and maternal/delivery information. The source of data for this study was a multi-center retrospective study with 2949 mother–newborn pairs. The mean-squared-error-over-variance measures of five machine learning approaches were compared for newborn’s weight, newborn’s weight/height, newborn’s weight/height2 and newborn’s weight/hieght3. Random forest variable importance, the influence of a variable over average node impurity, was used to identify major predictors of these newborns’ weight-for-height indicators among ultrasonographic data and maternal/delivery information. Regarding ultrasonographic fetal biometry, newborn’s weight, newborn’s weight/height and newborn’s weight/height2 were better indicators with smaller mean-squared-error-over-variance measures than newborn’s weight/height3. Based on random forest variable importance, the top six predictors of newborn’s weight were the same as those of newborn’s weight/height and those of newborn’s weight/height2: gestational age at delivery time, the first estimated fetal weight and abdominal circumference in week 36 or later, maternal weight and body mass index at delivery time, and the first biparietal diameter in week 36 or later. These six predictors also ranked within the top seven for large-for-gestational-age and the top eight for small-for-gestational-age. In conclusion, newborn’s weight, newborn’s weight/height and newborn’s weight/height2 are more suitable for ultrasonographic fetal biometry with smaller mean-squared-error-over-variance measures than newborn’s weight/height3. Machine learning with ultrasonographic data would be an effective noninvasive approach for predicting newborn’s weight, weight/height and weight/height2.


2021 ◽  
Vol 11 (1) ◽  
pp. 08-19
Author(s):  
Weskley Damasceno Silva ◽  
Silas Santiago Lopes Pereira ◽  
Daniel Santiago Pereira ◽  
Michell Olívio Xavier da Costa

O setor apícola tem ganhado grandes proporções nos últimos tempos em termos de produção e comercialização de produtos, como o mel e seus derivados. O Brasil, apesar de ter acompanhado esse crescimento e possuir boas características para o desenvolvimento da apicultura, ainda sofre com a limitação no uso de ferramentas tecnológicas, o que afeta diretamente os níveis de produção. Este artigo propõe o desenvolvimento de uma ferramenta tecnológica que auxilie o apicultor no gerenciamento eficiente da produção apícola e na tomada de decisão a partir de modelos preditivos baseados em Machine Learning (ML) e integrados a um sistema web. Para tanto, foram utilizados diferentes algoritmos de ML para predição de produção de mel, tais como a Regressão Linear Múltipla, Decision Tree, Random Forest, Multilayer Perceptron (MLP) e Support Vector Regression (SVR). Os modelos gerados foram avaliados com base no coeficiente de determinação (R2 ou Score) e o cálculo de erro das predições utilizando a Root Mean Squared Error (RMSE). Os resultados desta pesquisa contam com um sistema web em desenvolvimento e resultados dos experimentos realizados, que mostram uma melhor performance da técnica MLP com Score de 0.98 e RMSE de 711196 libras.


2021 ◽  
Vol 9 ◽  
Author(s):  
Jayakumar Kaliappan ◽  
Kathiravan Srinivasan ◽  
Saeed Mian Qaisar ◽  
Karpagam Sundararajan ◽  
Chuan-Yu Chang ◽  
...  

This paper aims to evaluate the performance of multiple non-linear regression techniques, such as support-vector regression (SVR), k-nearest neighbor (KNN), Random Forest Regressor, Gradient Boosting, and XGBOOST for COVID-19 reproduction rate prediction and to study the impact of feature selection algorithms and hyperparameter tuning on prediction. Sixteen features (for example, Total_cases_per_million and Total_deaths_per_million) related to significant factors, such as testing, death, positivity rate, active cases, stringency index, and population density are considered for the COVID-19 reproduction rate prediction. These 16 features are ranked using Random Forest, Gradient Boosting, and XGBOOST feature selection algorithms. Seven features are selected from the 16 features according to the ranks assigned by most of the above mentioned feature-selection algorithms. Predictions by historical statistical models are based solely on the predicted feature and the assumption that future instances resemble past occurrences. However, techniques, such as Random Forest, XGBOOST, Gradient Boosting, KNN, and SVR considered the influence of other significant features for predicting the result. The performance of reproduction rate prediction is measured by mean absolute error (MAE), mean squared error (MSE), root mean squared error (RMSE), R-Squared, relative absolute error (RAE), and root relative squared error (RRSE) metrics. The performances of algorithms with and without feature selection are similar, but a remarkable difference is seen with hyperparameter tuning. The results suggest that the reproduction rate is highly dependent on many features, and the prediction should not be based solely upon past values. In the case without hyperparameter tuning, the minimum value of RAE is 0.117315935 with feature selection and 0.0968989 without feature selection, respectively. The KNN attains a low MAE value of 0.0008 and performs well without feature selection and with hyperparameter tuning. The results show that predictions performed using all features and hyperparameter tuning is more accurate than predictions performed using selected features.


2020 ◽  
Vol 6 (3) ◽  
pp. 49-54
Author(s):  
Niyalatul Muna ◽  
Faisal Lutfi Afriansyah ◽  
Ameng Bagus Suprayogy

Tingkat dehidrasi tidak hanya bisa dirasakan secara langsung akan tetapi dapat diamati dan dilihat secara fisik berbasis visual. Secara visual salah satu gejala dari dehidrasi dapat dilihat dari warna urine. Gejala ini biasanya tidak begitu diperhatikan dan dianggap biasa. Padahal gejala hipohidrasi atau dehidrasi merupakan dampak yang merugikan dari asupan air yang tidak memadai sehingga mempengaruhi warna urine yang dihasilkan. Kesulitan panca indra manusia membedakan gejala dehidrasi dan melihat perbedaan warna urine secara visual sering diterjemahkan berbeda-beda, dikarenakan tingkat kemiripan warna yang dihasilkan. Beberapa penelitian menunjukkan adanya pemanfaatan teknologi kamera dengan sistem cerdas dapat membantu kesulitan dan keterbatasan panca indra manusia. Penelitian ini menggunakan citra urine diambil dari sample orang dewasa yang dikelompokkan berdasarkan kategori warna urine hasil penelitian terdahulu. Pengambilan fitur dari setiap citra urine diambil nilai warna dari  YCbCr. Model warna yang dihasilkan dari setiap sampel akan diidentifikasi menggunakan algoritma Random Forest dengan cross-validation. Hasil dari percobaan yang dilakukan menunjukkan akurasi 90% dari 30 dataset yang diujikan dengan nilai precision 90.2%, recall 90%, Mean absolute error 0.2473, dan Root mean squared error sebesar 0.3208.


Author(s):  
Moritz Feigl ◽  
Katharina Lebiedzinski ◽  
Mathew Herrnegger ◽  
Karsten Schulz

ZusammenfassungDie Fließgewässertemperatur ist ein essenzieller Umweltfaktor, der das Potenzial hat, sowohl ökologische als auch sozio-ökonomische Rahmenbedingungen im Umfeld eines Gewässers zu verändern. Um Fließgewässertemperaturen als Grundlage für effektive Anpassungsstrategien für zukünftige Veränderungen (z. B. durch den Klimawandel) berechnen zu können, sind adäquate Modellierungskonzepte notwendig. Die vorliegende Studie untersucht hierfür 6 Machine Learning-Modelle: Schrittweise Lineare Regression, Random Forest, eXtreme Gradient Boosting, Feedforward Neural Networks und zwei Arten von Recurrent Neural Networks. Die Modelle wurden an 10 österreichischen Einzugsgebieten mit unterschiedlichen physiographischen Eigenschaften und Eingangsdatenkombinationen getestet. Die Hyperparameter der angewandten Modelle wurden mittels Bayes’scher Hyperparameteroptimierung optimiert. Um die Ergebnisse mit anderen Studien vergleichbar zu machen, wurden die Vorhersagen der 6 Machine Learning-Modelle den Ergebnissen der linearen Regression und dem häufig verwendeten und bekannten Wassertemperaturmodell air2stream gegenübergestellt.Von den 6 getesteten Modellen zeigten die Feedforward Neural Networks und das eXtreme Gradient Boosting die besten Vorhersagen in jeweils 4 von 10 Einzugsgebieten. Mit einem durchschnittlichen RMSE (Wurzel der mittleren Fehlerquadratsumme; root mean squared error) von 0,55 °C konnten die getesteten Modelle die Fließgewässertemperaturen deutlich besser prognostizieren als die lineare Regression (1,55 °C) und air2stream (0,98 °C). Generell zeigten die Ergebnisse der 6 Modelle eine sehr vergleichbare Leistung mit lediglich einer mittleren Abweichung um den Medianwert von 0,08 °C zwischen den einzelnen Modellen. Im größten untersuchten Einzugsgebiet – Donau bei Kienstock – wiesen Recurrent Neural Networks die höchste Modellgüte auf, was darauf hinweist, dass sie sich am besten eignen, wenn im Einzugsgebiet Prozesse mit langfristigen Abhängigkeiten ausschlaggebend sind. Die Wahl der Hyperparameter beeinflusste die Vorhersagefähigkeit der Modelle stark, was die Bedeutung der Hyperparameteroptimierung besonders hervorhebt.Die Ergebnisse dieser Studie fassen die Bedeutung unterschiedlicher Eingangsdaten, Modelle und Trainingscharakteristiken für die Modellierung von mittleren täglichen Fließgewässertemperaturen zusammen. Gleichzeitig dient diese Studie als Basis für die Entwicklung zukünftiger Modelle für eine regionale Fließgewässertemperaturvorhersage. Die getesteten Modelle stehen im open source R‑Paket wateRtemp allen AnwenderInnen der Forschungsgemeinschaft und der Praxis zur Verfügung.


Sign in / Sign up

Export Citation Format

Share Document