Modelling daily water temperature from air temperature for the Missouri River

The bio-chemical and physical characteristics of a river are directly affected by water temperature, which thereby affects the overall health of aquatic ecosystems. It is a complex problem to accurately estimate water temperature. Modelling of river water temperature is usually based on a suitable mathematical model and field measurements of various atmospheric factors. In this article, the air–water temperature relationship of the Missouri River is investigated by developing three different machine learning models (Artificial Neural Network (ANN), Gaussian Process Regression (GPR), and Bootstrap Aggregated Decision Trees (BA-DT)). Standard models (linear regression, non-linear regression, and stochastic models) are also developed and compared to machine learning models. Analyzing the three standard models, the stochastic model clearly outperforms the standard linear model and nonlinear model. All the three machine learning models have comparable results and outperform the stochastic model, with GPR having slightly better results for stations No. 2 and 3, while BA-DT has slightly better results for station No. 1. The machine learning models are very effective tools which can be used for the prediction of daily river temperature.

Download Full-text

Machine learning methods for stream water temperature prediction

10.5194/hess-2020-670 ◽

2021 ◽

Author(s):

Moritz Feigl ◽

Katharina Lebiedzinski ◽

Mathew Herrnegger ◽

Karsten Schulz

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Linear Regression ◽

Water Temperature ◽

Stream Water ◽

Easy Access ◽

Learning Models ◽

Temperature Prediction ◽

Wide Range ◽

Machine Learning Models

Abstract. Water temperature in rivers is a crucial environmental factor with the ability to alter hydro-ecological as well as socio-economic conditions within a catchment. The development of modelling concepts for predicting river water temperature is and will be essential for an effective integrated water management and the development of adaptation strategies to future global changes (e.g. climate change). This study tests the performance of 6 different machine learning models: step-wise linear regression, Random forest, eXtreme Gradient Boosting (XGBoost), Feedforward neural networks (FNN), and two types of Recurrent neural networks (RNN). All models are applied using different data inputs for daily water temperature prediction in 10 Austrian catchments ranging from 200 km2 to 96000 km2 and exhibiting a wide range of physiographic characteristics. The evaluated input data sets include combinations of daily means of air temperature, runoff, precipitation and global radiation. Bayesian optimization is applied to optimize the hyperparameters of all applied machine learning models. To make the results comparable to previous studies, two widely used benchmark models are applied additionally: linear regression and air2stream. With a mean root mean squared error (RMSE) of 0.55 °C the tested models could significantly improve water temperature prediction compared to linear regression (1.55 °C) and air2stream (0.98 °C). In general, the results show a very similar performance of the tested machine learning models, with a median RMSE difference of 0.08 °C between the models. From the 6 tested machine learning models both FNNs and XGBoost performed best in 4 of the 10 catchments. RNNs are the best performing models in the largest catchment, indicating that RNNs are mainly performing well when processes with long-term dependencies are important. Furthermore, a wide range of performance was observed for different hyperparameter sets for the tested models, showing the importance of hyperprameter optimization. Especially the FNN model results showed an extremely large RMSE standard deviation of 1.60 °C due to the chosen hyperparamerters. This study evaluates different sets of input variables, machine learning models and training characteristics for daily stream water temperature prediction, acting as a basis for future development of regional multi-catchment water temperature prediction models. All preprocessing steps and models are implemented into the open source R package wateRtemp, to provide easy access to these modelling approaches and facilitate further research.

Download Full-text

Machine-learning methods for stream water temperature prediction

Hydrology and Earth System Sciences ◽

10.5194/hess-25-2951-2021 ◽

2021 ◽

Vol 25 (5) ◽

pp. 2951-2977

Author(s):

Moritz Feigl ◽

Katharina Lebiedzinski ◽

Mathew Herrnegger ◽

Karsten Schulz

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Linear Regression ◽

Water Temperature ◽

Stream Water ◽

Easy Access ◽

Learning Models ◽

Temperature Prediction ◽

Wide Range ◽

Machine Learning Models

Abstract. Water temperature in rivers is a crucial environmental factor with the ability to alter hydro-ecological as well as socio-economic conditions within a catchment. The development of modelling concepts for predicting river water temperature is and will be essential for effective integrated water management and the development of adaptation strategies to future global changes (e.g. climate change). This study tests the performance of six different machine-learning models: step-wise linear regression, random forest, eXtreme Gradient Boosting (XGBoost), feed-forward neural networks (FNNs), and two types of recurrent neural networks (RNNs). All models are applied using different data inputs for daily water temperature prediction in 10 Austrian catchments ranging from 200 to 96 000 km2 and exhibiting a wide range of physiographic characteristics. The evaluated input data sets include combinations of daily means of air temperature, runoff, precipitation and global radiation. Bayesian optimization is applied to optimize the hyperparameters of all applied machine-learning models. To make the results comparable to previous studies, two widely used benchmark models are applied additionally: linear regression and air2stream. With a mean root mean squared error (RMSE) of 0.55 ∘C, the tested models could significantly improve water temperature prediction compared to linear regression (1.55 ∘C) and air2stream (0.98 ∘C). In general, the results show a very similar performance of the tested machine-learning models, with a median RMSE difference of 0.08 ∘C between the models. From the six tested machine-learning models both FNNs and XGBoost performed best in 4 of the 10 catchments. RNNs are the best-performing models in the largest catchment, indicating that RNNs mainly perform well when processes with long-term dependencies are important. Furthermore, a wide range of performance was observed for different hyperparameter sets for the tested models, showing the importance of hyperparameter optimization. Especially the FNN model results showed an extremely large RMSE standard deviation of 1.60 ∘C due to the chosen hyperparameters. This study evaluates different sets of input variables, machine-learning models and training characteristics for daily stream water temperature prediction, acting as a basis for future development of regional multi-catchment water temperature prediction models. All preprocessing steps and models are implemented in the open-source R package wateRtemp to provide easy access to these modelling approaches and facilitate further research.

Download Full-text

QUBO formulations for training machine learning models

Scientific Reports ◽

10.1038/s41598-021-89461-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Prasanna Date ◽

Davis Arthur ◽

Lauren Pusey-Nazzaro

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Large Scale ◽

Support Vector ◽

Quantum Computers ◽

Np Hard ◽

Learning Models ◽

Moore’S Law ◽

Moore's Law ◽

Machine Learning Models

AbstractTraining machine learning models on classical computers is usually a time and compute intensive process. With Moore’s law nearing its inevitable end and an ever-increasing demand for large-scale data analysis using machine learning, we must leverage non-conventional computing paradigms like quantum computing to train machine learning models efficiently. Adiabatic quantum computers can approximately solve NP-hard problems, such as the quadratic unconstrained binary optimization (QUBO), faster than classical computers. Since many machine learning problems are also NP-hard, we believe adiabatic quantum computers might be instrumental in training machine learning models efficiently in the post Moore’s law era. In order to solve problems on adiabatic quantum computers, they must be formulated as QUBO problems, which is very challenging. In this paper, we formulate the training problems of three machine learning models—linear regression, support vector machine (SVM) and balanced k-means clustering—as QUBO problems, making them conducive to be trained on adiabatic quantum computers. We also analyze the computational complexities of our formulations and compare them to corresponding state-of-the-art classical approaches. We show that the time and space complexities of our formulations are better (in case of SVM and balanced k-means clustering) or equivalent (in case of linear regression) to their classical counterparts.

Download Full-text

A study of micromanufacturing process fingerprints in micro-injection moulding for machine learning and Industry 4.0 applications

The International Journal of Advanced Manufacturing Technology ◽

10.1007/s00170-021-07252-7 ◽

2021 ◽

Author(s):

Mert Gülçür ◽

Ben Whiteside

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Multiple Linear Regression ◽

Injection Moulding ◽

Industry 4.0 ◽

Supervised Machine Learning ◽

Process Conditions ◽

Learning Models ◽

Micro Injection Moulding ◽

Machine Learning Models

AbstractThis paper discusses micromanufacturing process quality proxies called “process fingerprints” in micro-injection moulding for establishing in-line quality assurance and machine learning models for Industry 4.0 applications. Process fingerprints that we present in this study are purely physical proxies of the product quality and need tangible rationale regarding their selection criteria such as sensitivity, cost-effectiveness, and robustness. Proposed methods and selection reasons for process fingerprints are also justified by analysing the temporally collected data with respect to the microreplication efficiency. Extracted process fingerprints were also used in a multiple linear regression scenario where they bring actionable insights for creating traceable and cost-effective supervised machine learning models in challenging micro-injection moulding environments. Multiple linear regression model demonstrated %84 accuracy in predicting the quality of the process, which is significant as far as the extreme process conditions and product features are concerned.

Download Full-text

Development of Combined Heavy Rain Damage Prediction Models with Machine Learning

Water ◽

10.3390/w11122516 ◽

2019 ◽

Vol 11 (12) ◽

pp. 2516 ◽

Cited By ~ 1

Author(s):

Changhyun Choi ◽

Jeonghwan Kim ◽

Jungwook Kim ◽

Hung Soo Kim

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Prediction Model ◽

Prediction Models ◽

Predictive Performance ◽

Heavy Rain ◽

Learning Models ◽

Damage Prediction ◽

Natural Disaster Management ◽

Machine Learning Models

Adequate forecasting and preparation for heavy rain can minimize life and property damage. Some studies have been conducted on the heavy rain damage prediction model (HDPM), however, most of their models are limited to the linear regression model that simply explains the linear relation between rainfall data and damage. This study develops the combined heavy rain damage prediction model (CHDPM) where the residual prediction model (RPM) is added to the HDPM. The predictive performance of the CHDPM is analyzed to be 4–14% higher than that of HDPM. Through this, we confirmed that the predictive performance of the model is improved by combining the RPM of the machine learning models to complement the linearity of the HDPM. The results of this study can be used as basic data beneficial for natural disaster management.

Download Full-text

A Machine Learning Approach to Predict the Performance of Refrigerator and Air Conditioning Using Gaussian Process Regression and Support Vector Methods

Recent Patents on Mechanical Engineering ◽

10.2174/2212797614666210125143810 ◽

2021 ◽

Vol 14 ◽

Author(s):

Harinarayan Sharma ◽

Sonam Kumari ◽

Aniket K. Dutt ◽

Pawan Kumar ◽

Mamookho E. Makhatha

Keyword(s):

Machine Learning ◽

Air Conditioning ◽

Gaussian Process Regression ◽

Coefficient Of Performance ◽

Experimental Result ◽

Support Vector ◽

Nano Particle ◽

Learning Models ◽

Machine Learning Model ◽

Machine Learning Models

Aim: Develop machine learning models for the performance of refrigerator and airconditioning system. Background: The Coefficient Of Performance (COP) of Refrigerator and Air-Conditioning (RAC) is a complex function of evaporative temperature and concentration of nano-particle in lubricants. In recent years, researchers focus on experimental study for improvement of COP. Further, few researchers applied simulation techniques such as fuzzy system, Artificial Neural Network (ANN), simulated annealing, etc. to the Vapour Compression Refrigeration (VCR) cycle. There is a scarcity of modeling research work for the performance of RAC system. Objective: The study aims to develop the machine learning predictive models for the performance of refrigerator and air-conditioning system using experimental data. Methods: The experiment was performed on VCR system to determine COP. Three different concentration of lubricants (added 0.5, 1.0 and 1.5g nano-TiO2 particle on 1 liter of Polyolester (POE) oil) were used. The experimentally calculated COP was used to train and test the machine learning models. Gaussian Process Regression (GPR) and Support Vector Regression (SVR) methods were applied to develop the models. Results: The experimental result reveals that the COP increases with increasing the concentration (of nano particles) at a given temperature. The addition of 0.5 and 1.0g TiO2 in the POE oil shows better rate of increment in the COP in comparison to addition of 1.5g TiO2 in the POE oil. Machine learning models using GPR and SVR with RBF kernel function is the most appropriate machine learning model for the nonlinear relationship between the output parameter (COP) and the input parameter (evaporative temperature and concentration of TiO2). Conclusion: The present study was conducted to investigate the machine learning approaches for performance of RAC system using experimental data sets. The experimental result shows that R134a and TiO2-POE nanolubricant work efficiently and the coefficient of performance of VCR system increases with concentration of nano-particle. The developed model performance is compared using coefficient of correlation and RSME values. After comparison, it is concluded that RBF based GPR model is the best fit machine learning model to predict the COP in the context of any other model for this data set.

Download Full-text

Using Big Data-machine learning models for diabetes prediction and flight delays analytics

Journal Of Big Data ◽

10.1186/s40537-020-00355-0 ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

Thérence Nibareke ◽

Jalal Laassiri

Keyword(s):

Machine Learning ◽

Big Data ◽

Linear Regression ◽

Decision Tree ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Smart Devices ◽

Learning Models ◽

Flight Delays ◽

Machine Learning Models

Abstract Introduction Nowadays large data volumes are daily generated at a high rate. Data from health system, social network, financial, government, marketing, bank transactions as well as the censors and smart devices are increasing. The tools and models have to be optimized. In this paper we applied and compared Machine Learning algorithms (Linear Regression, Naïve bayes, Decision Tree) to predict diabetes. Further more, we performed analytics on flight delays. The main contribution of this paper is to give an overview of Big Data tools and machine learning models. We highlight some metrics that allow us to choose a more accurate model. We predict diabetes disease using three machine learning models and then compared their performance. Further more we analyzed flight delay and produced a dashboard which can help managers of flight companies to have a 360° view of their flights and take strategic decisions. Case description We applied three Machine Learning algorithms for predicting diabetes and we compared the performance to see what model give the best results. We performed analytics on flights datasets to help decision making and predict flight delays. Discussion and evaluation The experiment shows that the Linear Regression, Naive Bayesian and Decision Tree give the same accuracy (0.766) but Decision Tree outperforms the two other models with the greatest score (1) and the smallest error (0). For the flight delays analytics, the model could show for example the airport that recorded the most flight delays. Conclusions Several tools and machine learning models to deal with big data analytics have been discussed in this paper. We concluded that for the same datasets, we have to carefully choose the model to use in prediction. In our future works, we will test different models in other fields (climate, banking, insurance.).

Download Full-text

Assessing the performance of a suite of machine learning models for daily river water temperature prediction

PeerJ ◽

10.7717/peerj.7065 ◽

2019 ◽

Vol 7 ◽

pp. e7065 ◽

Cited By ~ 9

Author(s):

Senlin Zhu ◽

Emmanuel Karlo Nyarko ◽

Marijana Hadzima-Nyarko ◽

Salim Heddam ◽

Shiqiang Wu

Keyword(s):

Machine Learning ◽

Water Temperature ◽

River Water ◽

Minor Role ◽

Learning Models ◽

Temperature Prediction ◽

Flow Discharge ◽

River Water Temperature ◽

Machine Learning Models

In this study, different versions of feedforward neural network (FFNN), Gaussian process regression (GPR), and decision tree (DT) models were developed to estimate daily river water temperature using air temperature (Ta), flow discharge (Q), and the day of year (DOY) as predictors. The proposed models were assessed using observed data from eight river stations, and modelling results were compared with the air2stream model. Model performances were evaluated using four indicators in this study: the coefficient of correlation (R), the Willmott index of agreement (d), the root mean squared error (RMSE), and the mean absolute error (MAE). Results indicated that the three machine learning models had similar performance when only Ta was used as the predictor. When the day of year was included as model input, the performances of the three machine learning models dramatically improved. Including flow discharge instead of day of year, as an additional predictor, provided a lower gain in model accuracy, thereby showing the relatively minor role of flow discharge in river water temperature prediction. However, an increase in the relative importance of flow discharge was noticed for stations with high altitude catchments (Rhône, Dischmabach and Cedar) which are influenced by cold water releases from hydropower or snow melting, suggesting the dependence of the role of flow discharge on the hydrological characteristics of such rivers. The air2stream model outperformed the three machine learning models for most of the studied rivers except for the cases where including flow discharge as a predictor provided the highest benefits. The DT model outperformed the FFNN and GPR models in the calibration phase, however in the validation phase, its performance slightly decreased. In general, the FFNN model performed slightly better than GPR model. In summary, the overall modelling results showed that the three machine learning models performed well for river water temperature modelling.

Download Full-text

Development of a machine-learning model to assess terminal ileum Endoscopic healing in pediatric Crohn's disease from Magnetic Resonance Enterography data

10.1101/2021.08.29.21262424 ◽

2021 ◽

Author(s):

Itai Guez ◽

Gili Focht ◽

Mary-Louise C.Greer ◽

Ruth Cytter-Kuint ◽

Li-tal Pratt ◽

...

Keyword(s):

Machine Learning ◽

Magnetic Resonance ◽

Linear Regression ◽

Regression Models ◽

Magnetic Resonance Enterography ◽

Linear Regression Models ◽

Learning Models ◽

Machine Learning Model ◽

Relevant Variables ◽

Machine Learning Models

Background and Aims: Endoscopic healing (EH), is a major treatment goal for Crohn's disease(CD). However, terminal ileum (TI) intubation failure is common, especially in children. We evaluated the added-value of machine-learning models in imputing a TI Simple Endoscopic Score for CD (SES-CD) from Magnetic Resonance Enterography (MRE) data of pediatric CD patients. Methods: This is a sub-study of the prospective ImageKids study. We developed machine-learning and baseline linear-regression models to predict TI SES-CD score from the Magnetic Resonance Index of Activity (MaRIA) and the Pediatric Inflammatory Crohn's MRE Index (PICMI) variables. We assessed TI SES-CD predictions' accuracy for intubated patients with a stratified 2-fold validation experimental setup, repeated 50 times. We determined clinical impact by imputing TI SES-CD in patients with ileal intubation failure during ileocolonscopy. Results: A total of 223 children were included (mean age 14.1+-2.5 years), of whom 132 had all relevant variables (107 with TI intubation and 25 with TI intubation failure). The combination of a machine-learning model with the PICMI variables achieved the lowest SES-CD prediction error compared to a baseline MaRIA-based linear regression model for the intubated patients (N=107, 11.7 (10.5-12.5) vs. 12.1 (11.4-12.9), p<0.05). The PICMI-based models suggested a higher rate of patients with TI disease among the non-intubated patients compared to a baseline MaRIA-based linear regression model (N=25, up to 25/25 (100%) vs. 23/25 (92%)). Conclusions: Machine-learning models with clinically-relevant variables as input are more accurate than linear-regression models in predicting TI SES-CD and EH when using the same MRE-based variables.

Download Full-text

Machine learning predictive models of LDL-C in the population of eastern India and its comparison with directly measured and calculated LDL-C

Annals of Clinical Biochemistry International Journal of Laboratory Medicine ◽

10.1177/00045632211046805 ◽

2021 ◽

pp. 000456322110468

Author(s):

Anudeep P P ◽

Suchitra Kumari ◽

Aishvarya S Rajasimman ◽

Saurav Nayak ◽

Pooja Priyadarsini

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Random Forests ◽

Predictive Performance ◽

Support Vector ◽

Learning Models ◽

Complex Interactions ◽

Clinical Biochemistry Laboratory ◽

Study Laboratory ◽

Machine Learning Models

Background LDL-C is a strong risk factor for cardiovascular disorders. The formulas used to calculate LDL-C showed varying performance in different populations. Machine learning models can study complex interactions between the variables and can be used to predict outcomes more accurately. The current study evaluated the predictive performance of three machine learning models—random forests, XGBoost, and support vector Rregression (SVR) to predict LDL-C from total cholesterol, triglyceride, and HDL-C in comparison to linear regression model and some existing formulas for LDL-C calculation, in eastern Indian population. Methods The lipid profiles performed in the clinical biochemistry laboratory of AIIMS Bhubaneswar during 2019–2021, a total of 13,391 samples were included in the study. Laboratory results were collected from the laboratory database. 70% of data were classified as train set and used to develop the three machine learning models and linear regression formula. These models were tested in the rest 30% of the data (test set) for validation. Performance of models was evaluated in comparison to best six existing LDL-C calculating formulas. Results LDL-C predicted by XGBoost and random forests models showed a strong correlation with directly estimated LDL-C (r = 0.98). Two machine learning models performed superior to the six existing and commonly used LDL-C calculating formulas like Friedewald in the study population. When compared in different triglycerides strata also, these two models outperformed the other methods used. Conclusion Machine learning models like XGBoost and random forests can be used to predict LDL-C with more accuracy comparing to conventional linear regression LDL-C formulas.

Download Full-text