Prediction Model of Temperature of Cast Billet Based on Its Heating Retrospection Using Boosting “Random Forest” Structure

Petr I. Zhukov; Anton I. Glushchenko; Andrey V. Fomin

doi:10.25205/1818-7900-2020-18-4-11-27

Prediction Model of Temperature of Cast Billet Based on Its Heating Retrospection Using Boosting “Random Forest” Structure

Vestnik NSU Series Information Technologies ◽

10.25205/1818-7900-2020-18-4-11-27 ◽

2020 ◽

Vol 18 (4) ◽

pp. 11-27

Author(s):

Petr I. Zhukov ◽

Anton I. Glushchenko ◽

Andrey V. Fomin

Keyword(s):

Random Forest ◽

Continuous Furnace ◽

Regression Trees ◽

Transient Heat Conduction ◽

Mean Value ◽

Heating Process ◽

Cast Billet ◽

Temperature Prediction ◽

Billet Temperature ◽

Data Driven Approach

The scope of this research is the prediction of a cast billet surface temperature, which it will have in the rolling mill after the heating process. The main problem is that such a prediction is needed before the cast billet will really leave the furnace. In many cases, the boundary value problem of the heat transfer, particularly the differential equations of the transient heat conduction, is used to solve this problem. But in this research an alternative data-driven approach is proposed, which is based on a model of the dependence of the billet temperature on the retrospection of its heating in the continuous furnace. Such a model is developed as a result of the analysis of the data from the furnace control system. Such data from the real furnace were collected and stored in the data warehouse. Their exploratory analysis was conducted. All data were splitted into training, testing and validation subsets. As a part of this research, the regression model previously developed by the authors was also validated. It seemed to be overfitted (the error on the test set was significantly higher than the one on the training set). To overcome this disadvantage, an alternative method to develop the required data-based model is proposed by authors on the basis of the Boosting and Bagging algorithms. They belong to the machine learning field. As a result of the experiments with the bagging and boosting, the required model structure was chosen as a “Random Forest” with special class of the regression trees known as DART (Dropout Adaptive Regression Trees). Based on a significant number of experiments with that model, the two confidence intervals of the temperature prediction were found: 68 % and 95 % ones. The mean value of the temperature prediction error was estimated as ~ 9 °C for both the test and validation sets.

Get full-text (via PubEx)

Comparison of Finite-difference and Data-based Models of Temperature Transfer Process in Heating Furnaces for Cast Billet Temperature Prediction

10.1109/summa53307.2021.9632048 ◽

2021 ◽

Author(s):

Petr Zhukov ◽

Andrey Fomin ◽

Anton Glushchenko ◽

Evgeniy Podvalnyi

Keyword(s):

Finite Difference ◽

Transfer Process ◽

Cast Billet ◽

Temperature Prediction ◽

Billet Temperature ◽

Heating Furnaces

Get full-text (via PubEx)

A machine learning based approach to clinopyroxene thermobarometry: model optimisation and distribution for use in Earth Sciences

10.31223/x5sg8d ◽

2021 ◽

Author(s):

Corin Jorgenson ◽

Oliver Higgins ◽

Maurizio Petrelli ◽

Florence Bégué ◽

Luca Caricchi

Keyword(s):

Earth Sciences ◽

Machine Learning ◽

Random Forest ◽

Model Performance ◽

Mean Value ◽

Performance Tuning ◽

Plumbing Systems ◽

Data Driven Approach ◽

Methodological Assessment ◽

Magma Plumbing

Thermobarometry is a fundamental tool to quantitatively interrogate magma plumbing systems and broaden our appreciation of volcanic processes. Developments in random forest-based machine learning lend themselves to a more data-driven approach to clinopyroxene thermobarometry. This can include allowing users to access and filter large experimental datasets that can be tailored to individual applications in Earth Sciences. Here we present a methodological assessment of random forest thermobarometry, using the R freeware package “extraTrees”, by investigating the model performance, tuning hyperparameters, and evaluating different methods for calculating uncertainties. We determine that deviating from the default hyperparameters used in the “extraTrees” package results in little difference in overall model performance (<0.2 kbar and <3 ⁰C difference in mean SEE). However, accuracy is greatly affected by how the final pressure or temperature (PT) value from the voting distribution of trees in the random forest is selected (mean, median or mode). This thus far has been unapproached in machine learning thermobarometry. Using the mean value leads to a higher residual between experimental and predicted PT, whereas using median values produces smaller residuals. Additionally, this work provides two comprehensive R scripts for users to apply the random forest methodology to natural datasets. The first script permits modification and filtering of the model calibration dataset. The second script contains pre-made models in which users can rapidly input their data to recover pressure and temperature estimates. These scripts are open source and can be accessed at https://github.com/corinjorgenson/RandomForest-cpx-thermobarometer.

Get full-text (via PubEx)

Random Forest Regressor-Based Approach for Detecting Fault Location and Duration in Power Systems

Sensors ◽

10.3390/s22020458 ◽

2022 ◽

Vol 22 (2) ◽

pp. 458

Author(s):

Zakaria El Mrabet ◽

Niroop Sugunaraj ◽

Prakash Ranganathan ◽

Shrirang Abhyankar

Keyword(s):

Neural Network ◽

Random Forest ◽

Power Systems ◽

Real Time ◽

Fault Location ◽

State Of The Art ◽

Economic Consequences ◽

Support Vector ◽

Detection Accuracy ◽

Data Driven Approach

Power system failures or outages due to short-circuits or “faults” can result in long service interruptions leading to significant socio-economic consequences. It is critical for electrical utilities to quickly ascertain fault characteristics, including location, type, and duration, to reduce the service time of an outage. Existing fault detection mechanisms (relays and digital fault recorders) are slow to communicate the fault characteristics upstream to the substations and control centers for action to be taken quickly. Fortunately, due to availability of high-resolution phasor measurement units (PMUs), more event-driven solutions can be captured in real time. In this paper, we propose a data-driven approach for determining fault characteristics using samples of fault trajectories. A random forest regressor (RFR)-based model is used to detect real-time fault location and its duration simultaneously. This model is based on combining multiple uncorrelated trees with state-of-the-art boosting and aggregating techniques in order to obtain robust generalizations and greater accuracy without overfitting or underfitting. Four cases were studied to evaluate the performance of RFR: 1. Detecting fault location (case 1), 2. Predicting fault duration (case 2), 3. Handling missing data (case 3), and 4. Identifying fault location and length in a real-time streaming environment (case 4). A comparative analysis was conducted between the RFR algorithm and state-of-the-art models, including deep neural network, Hoeffding tree, neural network, support vector machine, decision tree, naive Bayesian, and K-nearest neighborhood. Experiments revealed that RFR consistently outperformed the other models in detection accuracy, prediction error, and processing time.

Get full-text (via PubEx)

Classification and Regression Trees, Random Forest Algorithm

Machine Learning Approaches to Bioinformatics - Science, Engineering, and Biology Informatics ◽

10.1142/9789814287319_0009 ◽

2010 ◽

pp. 120-132

Keyword(s):

Random Forest ◽

Regression Trees ◽

Classification And Regression Trees ◽

Random Forest Algorithm ◽

Classification And Regression

Get full-text (via PubEx)

Residuals in the modelling of pollution concentration depending on meteorological conditions and traffic flow, employing decision trees

ITM Web of Conferences ◽

10.1051/itmconf/20182300016 ◽

2018 ◽

Vol 23 ◽

pp. 00016 ◽

Cited By ~ 3

Author(s):

Joanna A. Kamińska

Keyword(s):

Random Forest ◽

Traffic Flow ◽

Goodness Of Fit ◽

Regression Trees ◽

Meteorological Conditions ◽

Boosted Regression Trees ◽

Significant Preference ◽

Explanatory Variables ◽

The City

Two data mining methods – a random forest and boosted regression trees – were used to model values of roadside air pollution depending on meteorological conditions and traffic flow, using the example of data obtained in the city of Wrocław in the years 2015–2016. Eight explanatory variables – five continuous and three categorical – were considered in the models. A comparison was made of the quality of the fit of the models to empirical data. Commonly used goodness-of-fit measures did not imply a significant preference for either of the methods. Residual analysis was also performed; this showed boosted regression trees to be a more effective method for predicting typical values in the modelling of NO2, NOx and PM2.5, while the random forest method leads to smaller errors when predicting peaks.

Get full-text (via PubEx)

A Data-Driven Approach for Winter Precipitation Classification Using Weather Radar and NWP Data

Atmosphere ◽

10.3390/atmos11070701 ◽

2020 ◽

Vol 11 (7) ◽

pp. 701

Author(s):

Bong-Chul Seo

Keyword(s):

Random Forest ◽

Binary Classification ◽

Weather Prediction ◽

Model Development ◽

Winter Precipitation ◽

Ensemble Classification ◽

Supervised Machine Learning ◽

Data Driven ◽

Support Vector ◽

Data Driven Approach

This study describes a framework that provides qualitative weather information on winter precipitation types using a data-driven approach. The framework incorporates the data retrieved from weather radars and the numerical weather prediction (NWP) model to account for relevant precipitation microphysics. To enable multimodel-based ensemble classification, we selected six supervised machine learning models: k-nearest neighbors, logistic regression, support vector machine, decision tree, random forest, and multi-layer perceptron. Our model training and cross-validation results based on Monte Carlo Simulation (MCS) showed that all the models performed better than our baseline method, which applies two thresholds (surface temperature and atmospheric layer thickness) for binary classification (i.e., rain/snow). Among all six models, random forest presented the best classification results for the basic classes (rain, freezing rain, and snow) and the further refinement of the snow classes (light, moderate, and heavy). Our model evaluation, which uses an independent dataset not associated with model development and learning, led to classification performance consistent with that from the MCS analysis. Based on the visual inspection of the classification maps generated for an individual radar domain, we confirmed the improved classification capability of the developed models (e.g., random forest) compared to the baseline one in representing both spatial variability and continuity.

Get full-text (via PubEx)

Improving Prediction Accuracy Based On Optimized Random Forest Model with Weighted Sampling for Regression Trees

International Journal of Computer Trends and Technology ◽

10.14445/22312803/ijctt-v21p105 ◽

2015 ◽

Vol 21 (1) ◽

pp. 23-28

Author(s):

S Bharathidason ◽

◽

C. Jothi Venkataeswaran

Keyword(s):

Random Forest ◽

Prediction Accuracy ◽

Regression Trees ◽

Random Forest Model ◽

Forest Model

Get full-text (via PubEx)

Repeated measurements of blood lactate concentration as a prognostic marker in horses with acute colitis evaluated with classification and regression trees (CART) and random forest analysis

The Veterinary Journal ◽

10.1016/j.tvjl.2016.03.012 ◽

2016 ◽

Vol 213 ◽

pp. 18-23 ◽

Cited By ~ 7

Author(s):

M.B. Petersen ◽

A. Tolver ◽

L. Husted ◽

T.H. Tølbøll ◽

T.H. Pihl

Keyword(s):

Random Forest ◽

Blood Lactate ◽

Prognostic Marker ◽

Lactate Concentration ◽

Blood Lactate Concentration ◽

Regression Trees ◽

Repeated Measurements ◽

Acute Colitis ◽

Random Forest Analysis ◽

Classification And Regression

Get full-text (via PubEx)

Bus Travel Time Prediction: A Comparative Study of Linear and Non-Linear Machine Learning Models

Journal of Physics Conference Series ◽

10.1088/1742-6596/2161/1/012053 ◽

2022 ◽

Vol 2161 (1) ◽

pp. 012053

Author(s):

B P Ashwini ◽

R Sumathi ◽

H S Sudhira

Keyword(s):

Random Forest ◽

Travel Time ◽

Linear Models ◽

Public Transit ◽

Regression Trees ◽

Support Vector ◽

Learning Models ◽

Random Forest Regression ◽

The Public ◽

Non Linear

Abstract Congested roads are a global problem, and increased usage of private vehicles is one of the main reasons for congestion. Public transit modes of travel are a sustainable and eco-friendly alternative for private vehicle usage, but attracting commuters towards public transit mode is a mammoth task. Commuters expect the public transit service to be reliable, and to provide a reliable service it is necessary to fine-tune the transit operations and provide well-timed necessary information to commuters. In this context, the public transit travel time is predicted in Tumakuru, a tier-2 city of Karnataka, India. As this is one of the initial studies in the city, the performance comparison of eight Machines Learning models including four linear namely, Linear Regression, Ridge Regression, Least Absolute Shrinkage and Selection Operator Regression, and Support Vector Regression; and four non-linear models namely, k-Nearest Neighbors, Regression Trees, Random Forest Regression, and Gradient Boosting Regression Trees is conducted to identify a suitable model for travel time predictions. The data logs of one month (November 2020) of the Tumakuru city service, provided by Tumakuru Smart City Limited are used for the study. The time-of-the-day (trip start time), day-of-the-week, and direction of travel are used for the prediction. Travel time for both upstream and downstream are predicted, and the results are evaluated based on the performance metrics. The results suggest that the performance of non-linear models is superior to linear models for predicting travel times, and Random Forest Regression was found to be a better model as compared to other models.

Get full-text (via PubEx)

Seasonal forecasting of hydrological drought in the Limpopo basin: A comparison of statistical methods.

10.5194/hess-2016-4 ◽

2016 ◽

Cited By ~ 1

Author(s):

Mathias Seibert ◽

Bruno Merz ◽

Heiko Apel

Keyword(s):

Neural Networks ◽

Random Forest ◽

Early Warning ◽

Linear Models ◽

Regression Trees ◽

Forecast Skill ◽

Sea Surface ◽

Coefficient Of Determination ◽

Operating Characteristics ◽

Random Forest Regression

Abstract. The Limpopo basin in southern Africa is prone to droughts, which affect the livelihoods of millions of people in South Africa, Botswana, Zimbabwe, and Mozambique. Seasonal drought early warning is thus vital for the whole region. In this study, the predictability of hydrological droughts during the main runoff period from December to May is assessed with statistical approaches. Three methods (Multiple Linear Models, Artifical Neural Networks, Random Forest Regression Trees) are compared in terms of their ability to forecast streamflow with up to 12 months lead time. The following four main findings result from the study. 1) There are stations in the basin at which standardised streamflow is predictable with lead times up to 12 months. The results show high interstation differences of forecast skill but reach a coefficient of determination as high as 0.73 (cross validated). 2) A large range of potential predictors is considered in this study, comprising well established climate indices, customised teleconnection indices derived from sea surface temperatures, and antecedent streamflow as proxy of catchment conditions. El-Niño and customised indices, representing sea surface temperature in the Atlantic and Indian Ocean, prove to be important teleconnection predictors for the region. Antecedent streamflow is a strong predictor in small catchments (with median 42 % explained variance), whereas teleconnections exert a stronger influence in large catchments. 3) Multiple linear models show the best forecast skill in this study and the greatest robustness compared to artificial neural networks and Random Forest regression trees, despite their capabilities to represent non-linear relationships. 4) Employed in early warning the models can be used to forecast a specific drought level. Even if the coefficient of determination is low, the forecast models have a skill better than a climatological forecast, which is shown by analysis of receiver operating characteristics (ROC). Seasonal statistical forecasts in the Limpopo show promising results, and thus it is recommended to employ them complementary to existing forecasts in order to strengthen preparedness for droughts.

Get full-text (via PubEx)