Performance Evaluation of Online Machine Learning Models Based on Cyclic Dynamic and Feature-Adaptive Time Series

Ahmed Salih AL-KHALEEFA; Rosilah HASSAN; Mohd Riduan AHMAD; Faizan QAMAR; Zheng WEN; Azana Hafizah MOHD AMAN; Keping YU

doi:10.1587/transinf.2020bdp0002

Selection of Input Factors and Comparison of Machine Learning Models for Prediction of Dissolved Oxygen in Gyeongan Stream

Journal of Korean Society of Environmental Engineers ◽

10.4491/ksee.2021.43.3.206 ◽

2021 ◽

Vol 43 (3) ◽

pp. 206-217

Author(s):

Min Ji Kim ◽

Seon Jeong Byeon ◽

Kyung Min Kim ◽

Johng-Hwa Ahn

Keyword(s):

Neural Network ◽

Machine Learning ◽

Electrical Conductivity ◽

Time Series ◽

Performance Evaluation ◽

Suspended Solids ◽

Time Series Data ◽

Series Data ◽

Learning Models ◽

Machine Learning Models

Objectives : In this study, we select input factors for machine learning models to predict dissolved oxygen (DO) in Gyeongan Stream and compare results of performance evaluation indicators to find the optimal model.Methods : The water quality data from the specific points of Gyeongan Stream were collected between January 15, 1998 and December 30, 2019. The pretreatment data were divided into train and test data with the ratio of 7:3. We used random forest (RF), artificial neural network (ANN), convolutional neural network (CNN), and gated recurrent unit (GRU) among machine learning. RF and ANN were tested by both random split and time series data, while CNN and GRU conducted the experiment using only time series data. Performance evaluation indicators such as square of the correlation coefficient (R2), root mean square error (RMSE), and mean absolute error (MAE) were used to compare the optimal results for the models.Results and Discussion : Based on the RF variable importance results and references, water temperature, pH, electrical conductivity, PO4-P, NH4-N, total phosphorus, suspended solids, and NO3-N were used as input factors. Both RF and ANN performed better with time series data than random split. The model performance was good in order of RF > CNN > GRU > ANN.Conclusions : The eight input factors (water temperature, pH, electrical conductivity, PO4-P, NH4-N, total phosphorus, suspended solids, and NO3-N) were selected for machine learning models to predict DO in Gyeongan Stream. The best model for DO prediction was the RF model with time series data. Therefore, we suggest that the RF with the eight input factors could be used to predict the DO in streams.

A comparison of time series and machine learning models for inflation forecasting: empirical evidence from the USA

Neural Computing and Applications ◽

10.1007/s00521-016-2766-x ◽

2016 ◽

Vol 30 (5) ◽

pp. 1519-1527 ◽

Cited By ~ 8

Author(s):

Volkan Ülke ◽

Afsin Sahin ◽

Abdulhamit Subasi

Keyword(s):

Machine Learning ◽

Time Series ◽

Empirical Evidence ◽

Learning Models ◽

Inflation Forecasting ◽

The Usa ◽

Machine Learning Models

Discriminating Postural Control Behaviors from Posturography with Statistical Tests and Machine Learning Models: Does Time Series Length Matter?

Lecture Notes in Computer Science - Computational Science – ICCS 2018 ◽

10.1007/978-3-319-93713-7_28 ◽

2018 ◽

pp. 350-357

Author(s):

Luiz H. F. Giovanini ◽

Elisangela F. Manffra ◽

Julio C. Nievola

Keyword(s):

Machine Learning ◽

Time Series ◽

Postural Control ◽

Statistical Tests ◽

Learning Models ◽

Series Length ◽

Machine Learning Models

Intra-domain and cross-domain transfer learning for time series

10.5194/egusphere-egu21-12142 ◽

2021 ◽

Author(s):

Erik Otović ◽

Marko Njirjak ◽

Dario Jozinović ◽

Goran Mauša ◽

Alberto Michelini ◽

...

Keyword(s):

Machine Learning ◽

Time Series ◽

Transfer Learning ◽

Time Series Data ◽

The Other ◽

Series Data ◽

Sound Recognition ◽

Transfer Of Knowledge ◽

Learning Models ◽

Machine Learning Models

In this study, we compared the performance of machine learning models trained using transfer learning and those that were trained from scratch - on time series data. Four machine learning models were used for the experiment. Two models were taken from the field of seismology, and the other two are general-purpose models for working with time series data. The accuracy of selected models was systematically observed and analyzed when switching within the same domain of application (seismology), as well as between mutually different domains of application (seismology, speech, medicine, finance). In seismology, we used two databases of local earthquakes (one in counts, and the other with the instrument response removed) and a database of global earthquakes for predicting earthquake magnitude; other datasets targeted classifying spoken words (speech), predicting stock prices (finance) and classifying muscle movement from EMG signals (medicine). In practice, it is very demanding and sometimes impossible to collect datasets of tagged data large enough to successfully train a machine learning model. Therefore, in our experiment, we use reduced data sets of 1,500 and 9,000 data instances to mimic such conditions. Using the same scaled-down datasets, we trained two sets of machine learning models: those that used transfer learning for training and those that were trained from scratch. We compared the performances between pairs of models in order to draw conclusions about the utility of transfer learning. In order to confirm the validity of the obtained results, we repeated the experiments several times and applied statistical tests to confirm the significance of the results. The study shows when, within the set experimental framework, the transfer of knowledge brought improvements in terms of model accuracy and in terms of model convergence rate. Our results show that it is possible to achieve better performance and faster convergence by transferring knowledge from the domain of global earthquakes to the domain of local earthquakes; sometimes also vice versa. However, improvements in seismology can sometimes also be achieved by transferring knowledge from medical and audio domains. The results show that the transfer of knowledge between other domains brought even more significant improvements, compared to those within the field of seismology. For example, it has been shown that models in the field of sound recognition have achieved much better performance compared to classical models and that the domain of sound recognition is very compatible with knowledge from other domains. We came to similar conclusions for the domains of medicine and finance. Ultimately, the paper offers suggestions when transfer learning is useful, and the explanations offered can provide a good starting point for knowledge transfer using time series data.

An intelligent hybridization of ARIMA with machine learning models for time series forecasting

Knowledge-Based Systems ◽

10.1016/j.knosys.2019.03.011 ◽

2019 ◽

Vol 175 ◽

pp. 72-86 ◽

Cited By ~ 23

Author(s):

Domingos S. de O. Santos Júnior ◽

João F.L. de Oliveira ◽

Paulo S.G. de Mattos Neto

Keyword(s):

Machine Learning ◽

Time Series ◽

Time Series Forecasting ◽

Learning Models ◽

Machine Learning Models

Multi-step Time Series Forecasting of Electric Load Using Machine Learning Models

Artificial Intelligence and Soft Computing - Lecture Notes in Computer Science ◽

10.1007/978-3-319-91253-0_15 ◽

2018 ◽

pp. 148-159 ◽

Cited By ~ 7

Author(s):

Shamsul Masum ◽

Ying Liu ◽

John Chiverton

Keyword(s):

Machine Learning ◽

Time Series ◽

Time Series Forecasting ◽

Electric Load ◽

Learning Models ◽

Machine Learning Models

Predicting Benzene Concentration Using Machine Learning and Time Series Algorithms

Mathematics ◽

10.3390/math8122205 ◽

2020 ◽

Vol 8 (12) ◽

pp. 2205

Author(s):

Luis Alfonso Menéndez García ◽

Fernando Sánchez Lasheras ◽

Paulino José García Nieto ◽

Laura Álvarez de Prado ◽

Antonio Bernardo Sánchez

Keyword(s):

Machine Learning ◽

Time Series ◽

Moving Average ◽

Environmental Pollutants ◽

Multivariate Adaptive Regression Splines ◽

Support Vector ◽

Learning Models ◽

Vector Autoregressive ◽

Benzene Concentration ◽

Machine Learning Models

Benzene is a pollutant which is very harmful to our health, so models are necessary to predict its concentration and relationship with other air pollutants. The data collected by eight stations in Madrid (Spain) over nine years were analyzed using the following regression-based machine learning models: multivariate linear regression (MLR), multivariate adaptive regression splines (MARS), multilayer perceptron neural network (MLP), support vector machines (SVM), autoregressive integrated moving-average (ARIMA) and vector autoregressive moving-average (VARMA) models. Benzene concentration predictions were made from the concentration of four environmental pollutants: nitrogen dioxide (NO2), nitrogen oxides (NOx), particulate matter (PM10) and toluene (C7H8), and the performance measures of the model were studied from the proposed models. In general, regression-based machine learning models are more effective at predicting than time series models.

An Empirical Comparison of Machine Learning Models for Time Series Forecasting

Econometric Reviews ◽

10.1080/07474938.2010.481556 ◽

2010 ◽

Vol 29 (5-6) ◽

pp. 594-621 ◽

Cited By ~ 194

Author(s):

Nesreen K. Ahmed ◽

Amir F. Atiya ◽

Neamat El Gayar ◽

Hisham El-Shishiny

Keyword(s):

Machine Learning ◽

Time Series ◽

Time Series Forecasting ◽

Learning Models ◽

Empirical Comparison ◽

Machine Learning Models

Forecasting admissions in psychiatric hospitals before and during Covid-19

10.1101/2021.07.16.21260200 ◽

2021 ◽

Author(s):

Jan Wolff ◽

Ansgar Klimke ◽

Michael Marschollek ◽

Tim Kacprowski

Keyword(s):

Machine Learning ◽

Time Series ◽

Hospital Admissions ◽

Model Performance ◽

Psychiatric Hospitals ◽

Time Series Models ◽

Learning Models ◽

One Step ◽

Machine Learning Models ◽

Better Than

Introduction The COVID-19 pandemic has strong effects on most health care systems and individual services providers. Forecasting of admissions can help for the efficient organisation of hospital care. We aimed to forecast the number of admissions to psychiatric hospitals before and during the COVID-19 pandemic and we compared the performance of machine learning models and time series models. This would eventually allow to support timely resource allocation for optimal treatment of patients. Methods We used admission data from 9 psychiatric hospitals in Germany between 2017 and 2020. We compared machine learning models with time series models in weekly, monthly and yearly forecasting before and during the COVID-19 pandemic. Our models were trained and validated with data from the first two years and tested in prospectively sliding time-windows in the last two years. Results A total of 90,686 admissions were analysed. The models explained up to 90% of variance in hospital admissions in 2019 and 75% in 2020 with the effects of the COVID-19 pandemic. The best models substantially outperformed a one-step seasonal naive forecast (seasonal mean absolute scaled error (sMASE) 2019: 0.59, 2020: 0.76). The best model in 2019 was a machine learning model (elastic net, mean absolute error (MAE): 7.25). The best model in 2020 was a time series model (exponential smoothing state space model with Box-Cox transformation, ARMA errors and trend and seasonal components, MAE: 10.44), which adjusted more quickly to the shock effects of the COVID-19 pandemic. Models forecasting admissions one week in advance did not perform better than monthly and yearly models in 2019 but they did in 2020. The most important features for the machine learning models were calendrical variables. Conclusion Model performance did not vary much between different modelling approaches before the COVID-19 pandemic and established forecasts were substantially better than one-step seasonal naive forecasts. However, weekly time series models adjusted quicker to the COVID-19 related shock effects. In practice, different forecast horizons could be used simultaneously to allow both early planning and quick adjustments to external effects.

Time series analysis and forecasting of China’s energy production during Covid-19: statistical models vs machine learning models

10.21203/rs.3.rs-1074872/v3 ◽

2021 ◽

Author(s):

Zekai Lu ◽

Nian Liu ◽

Ying Xie ◽

Junhui Xu

Keyword(s):

Machine Learning ◽

Time Series ◽

Statistical Models ◽

Energy Production ◽

Production Data ◽

Learning Models ◽

Energy Research ◽

Public Health Emergencies ◽

Production Plans ◽

Machine Learning Models

Abstract COVID-19 is a huge catastrophe of global proportions, and this catastrophe has had far-reaching effects on energy production worldwide. In this paper, we build traditional statistical models and machine learning models to forecast energy production series in the post-pandemic period based on Chinese energy production data and COVID-19 Chinese epidemic data from 2018 to 2021. The experimental results showed that the optimal models in this study outperformed the baseline models on each series, with MAPE values less than 10. Further studies found that the LightGBM, NNAT and LSTM machine learning models worked better in unstable energy series, while the ARIMA statistical model still had an advantage in stable energy time series. Overall, the machine learning models outperformed the traditional models during COVID-19 in terms of prediction. Our findings provide an important reference for energy research in public health emergencies, as well as a theoretical basis for factories to adjust their production plans and governments to adjust their energy decisions during COVID-19.