scholarly journals Comparative analysis of machine learning approaches to analyze and predict the COVID-19 outbreak

2021 ◽  
Vol 7 ◽  
pp. e746
Author(s):  
Muhammad Naeem ◽  
Jian Yu ◽  
Muhammad Aamir ◽  
Sajjad Ahmad Khan ◽  
Olayinka Adeleye ◽  
...  

Background Forecasting the time of forthcoming pandemic reduces the impact of diseases by taking precautionary steps such as public health messaging and raising the consciousness of doctors. With the continuous and rapid increase in the cumulative incidence of COVID-19, statistical and outbreak prediction models including various machine learning (ML) models are being used by the research community to track and predict the trend of the epidemic, and also in developing appropriate strategies to combat and manage its spread. Methods In this paper, we present a comparative analysis of various ML approaches including Support Vector Machine, Random Forest, K-Nearest Neighbor and Artificial Neural Network in predicting the COVID-19 outbreak in the epidemiological domain. We first apply the autoregressive distributed lag (ARDL) method to identify and model the short and long-run relationships of the time-series COVID-19 datasets. That is, we determine the lags between a response variable and its respective explanatory time series variables as independent variables. Then, the resulting significant variables concerning their lags are used in the regression model selected by the ARDL for predicting and forecasting the trend of the epidemic. Results Statistical measures—Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE) and Symmetric Mean Absolute Percentage Error (SMAPE)—are used for model accuracy. The values of MAPE for the best-selected models for confirmed, recovered and deaths cases are 0.003, 0.006 and 0.115, respectively, which falls under the category of highly accurate forecasts. In addition, we computed 15 days ahead forecast for the daily deaths, recovered, and confirm patients and the cases fluctuated across time in all aspects. Besides, the results reveal the advantages of ML algorithms for supporting the decision-making of evolving short-term policies.

2021 ◽  
pp. 1-13
Author(s):  
Muhammad Rafi ◽  
Mohammad Taha Wahab ◽  
Muhammad Bilal Khan ◽  
Hani Raza

Automatic Teller Machine (ATM) are still largely used to dispense cash to the customers. ATM cash replenishment is a process of refilling ATM machine with a specific amount of cash. Due to vacillating users demands and seasonal patterns, it is a very challenging problem for the financial institutions to keep the optimal amount of cash for each ATM. In this paper, we present a time series model based on Auto Regressive Integrated Moving Average (ARIMA) technique called Time Series ARIMA Model for ATM (TASM4ATM). This study used ATM back-end refilling historical data from 6 different financial organizations in Pakistan. There are 2040 distinct ATMs and 18 month of replenishment data from these ATMs are used to train the proposed model. The model is compared with the state-of- the-art models like Recurrent Neural Network (RNN) and Amazon’s DeepAR model. Two approaches are used for forecasting (i) Single ATM and (ii) clusters of ATMs (In which ATMs are clustered with similar cash-demands). The Mean Absolute Percentage Error (MAPE) and Symmetric Mean Absolute Percentage Error (SMAPE) are used to evaluate the models. The suggested model produces far better forecasting as compared to the models in comparison and produced an average of 7.86/7.99 values for MAPE/SMAPE errors on individual ATMs and average of 6.57/6.64 values for MAPE/SMAPE errors on clusters of ATMs.


2021 ◽  
Author(s):  
Sebastian Johannes Fritsch ◽  
Konstantin Sharafutdinov ◽  
Moein Einollahzadeh Samadi ◽  
Gernot Marx ◽  
Andreas Schuppert ◽  
...  

BACKGROUND During the course of the COVID-19 pandemic, a variety of machine learning models were developed to predict different aspects of the disease, such as long-term causes, organ dysfunction or ICU mortality. The number of training datasets used has increased significantly over time. However, these data now come from different waves of the pandemic, not always addressing the same therapeutic approaches over time as well as changing outcomes between two waves. The impact of these changes on model development has not yet been studied. OBJECTIVE The aim of the investigation was to examine the predictive performance of several models trained with data from one wave predicting the second wave´s data and the impact of a pooling of these data sets. Finally, a method for comparison of different datasets for heterogeneity is introduced. METHODS We used two datasets from wave one and two to develop several predictive models for mortality of the patients. Four classification algorithms were used: logistic regression (LR), support vector machine (SVM), random forest classifier (RF) and AdaBoost classifier (ADA). We also performed a mutual prediction on the data of that wave which was not used for training. Then, we compared the performance of models when a pooled dataset from two waves was used. The populations from the different waves were checked for heterogeneity using a convex hull analysis. RESULTS 63 patients from wave one (03-06/2020) and 54 from wave two (08/2020-01/2021) were evaluated. For both waves separately, we found models reaching sufficient accuracies up to 0.79 AUROC (95%-CI 0.76-0.81) for SVM on the first wave and up 0.88 AUROC (95%-CI 0.86-0.89) for RF on the second wave. After the pooling of the data, the AUROC decreased relevantly. In the mutual prediction, models trained on second wave´s data showed, when applied on first wave´s data, a good prediction for non-survivors but an insufficient classification for survivors. The opposite situation (training: first wave, test: second wave) revealed the inverse behaviour with models correctly classifying survivors and incorrectly predicting non-survivors. The convex hull analysis for the first and second wave populations showed a more inhomogeneous distribution of underlying data when compared to randomly selected sets of patients of the same size. CONCLUSIONS Our work demonstrates that a larger dataset is not a universal solution to all machine learning problems in clinical settings. Rather, it shows that inhomogeneous data used to develop models can lead to serious problems. With the convex hull analysis, we offer a solution for this problem. The outcome of such an analysis can raise concerns if the pooling of different datasets would cause inhomogeneous patterns preventing a better predictive performance.


2020 ◽  
Vol 26 (4) ◽  
pp. 2362-2374
Author(s):  
Yumeng Zhang ◽  
Li Luo ◽  
Fengyi Zhang ◽  
Ruixiao Kong ◽  
Jianchao Yang ◽  
...  

The accurate forecast of radiology emergency patient flow is of great importance to optimize appointment scheduling decisions. This study used a multi-model approach to forecast daily radiology emergency patient flow with consideration of different patient sources. We constructed six linear and nonlinear models by considering the lag effects and corresponding time factors. The autoregressive integrated moving average and least absolute shrinkage and selection operator (Lasso) were selected from the category of linear models, whereas linear-and-radial support vector regression models, random forests and adaptive boosting were chosen from the category of nonlinear models. The models were applied to 4-year daily emergency visits data in the radiology department of West China Hospital in Chengdu, China. The mean absolute percentage error of six models ranged from 8.56 to 9.36 percent for emergency department patients, whereas it varied from 10.90 to 14.39 percent for ward patients. The best-performing model for total radiology visits was Lasso, which yielded a mean absolute percentage error of 7.06 percent. The arrival patterns of emergency department and total radiology emergency patient flows could be modeled by linear processes. By contrast, the nonlinear model performed best for ward patient flow. These findings will benefit hospital managers in managing efficient patient flow, thus improving service quality and increasing patient satisfaction.


Author(s):  
Noer Chamid ◽  
Muhammad Ainul Yaqin ◽  
Nailul Izzah

Analisis time series antara lain memahami dan menjelaskan mekanisme tertentu, meramalkan suatu nilai di masa depan dan mengoptimalkan sistem kendali. Dalam pengambilan keputusan yang menggunakan analisis time series tersebut perlu menggunakan software yang prabayar seperti Minitab, SPSS dan SAS sehingga perlu pembuatan sistem informasi yang mendukung keputusan dalam analisis tersebut. Sistem informasi yang dibuat tersebut akan dilakukan uji coba terhadap kehandalan dan diimplementasikan dalam pengambilan keputusan untuk menentukan penyusunan target pendapatan asli daerah di pemerintah daerah atau data lainnya. Model yang digunakan dalam menduga adalah dengan menggunakan 4 (empat) metode, yaitu : Metode Moving Average, Metode Eksponential Smooting, Metode Linier Trend Line dan Seasonal Adjusment. Dari 4 (empat) metode tersebut, dapat dipilih model yang terbaik dengan menggunakan kriteria menentukan nilai Mean Absolute Deviation (MAD) dan Mean Absolute Percentage Error (MAPE) yang terkecil. Sistem informasi yang dibuat tersebut sudah dilakukan uji coba terhadap kehandalan dan diimplementasikan dalam pengambilan keputusan untuk menentukan penyusunan target pendapatan asli daerah di pemerintah daerah. Sistem Pendukung Keputusan ini dapat dijadikan sebagai tool dalam membuat rekomendasi sebuah keputusan.Kata Kunci: Time Series, Sistem Pendukung Keputusan, Pendapatan Asli Daerah                                                                       


2019 ◽  
Vol 9 (3) ◽  
pp. 423 ◽  
Author(s):  
Shenghui Zhang ◽  
Yuewei Liu ◽  
Jianzhou Wang ◽  
Chen Wang

Wind power is an important part of a power system, and its use has been rapidly increasing as compared with fossil energy. However, due to the intermittence and randomness of wind speed, system operators and researchers urgently need to find more reliable wind-speed prediction methods. It was found that the time series of wind speed not only has linear characteristics, but also nonlinear. In addition, most methods only consider one criterion or rule (stability or accuracy), or one objective function, which can lead to poor forecasting results. So, wind-speed forecasting is still a difficult and challenging problem. The existing forecasting models based on combination-model theory can adapt to some time-series data and overcome the shortcomings of the single model, which achieves poor accuracy and instability. In this paper, a combined forecasting model based on data preprocessing, a nondominated sorting genetic algorithm (NSGA-III) with three objective functions and four models (two hybrid nonlinear models and two linear models) is proposed and was successfully applied to forecasting wind speed, which not only overcomes the issue of forecasting accuracy, but also solves the difficulties of forecasting stability. The experimental results show that the stability and accuracy of the proposed combined model are better than the single models, improving the mean absolute percentage error (MAPE) range from 0.007% to 2.31%, and the standard deviation mean absolute percentage error (STDMAPE) range from 0.0044 to 0.3497.


Author(s):  
João Paulo Teixeira ◽  
Paula Odete Fernandes

In this chapter four combinations of input features and the feedforward, cascade forward and recurrent architectures are compared for the task of forecast tourism time series. The input features of the ANNs consist in the combination of the previous 12 months, the index time modeled by two nodes used to the year and month and one input with the daily hours of sunshine (insolation duration). The index time features associated to the previous twelve values of the time series proved its relevance in this forecast task. The insolation variable can improved results with some architectures, namely the cascade forward architecture. Finally, the experimented ANN models/architectures produced a mean absolute percentage error between 4 and 6%, proving the ability of the ANN models based to forecast this time series. Besides, the feedforward architecture behaved better considering validation and test sets, with 4.2% percentage error in test set.


2013 ◽  
Vol 12 (2) ◽  
pp. 25
Author(s):  
S. STEVEN ◽  
S. NURDIATI ◽  
F. BUKHARI

Peramalan merupakan kegiatan memprediksi nilai suatu variabel di masa yang akan datang. Tujuan penelitian ini adalah memprediksi jumlah mahasiswa baru Institut Pertanian Bogor dengan menggunakan metode fuzzy time series dan metode pemulusan eksponensial ganda dari Holt serta membandingkan kedua metode tersebut dengan cara melihat tingkat ketepatan peramalan Mean Absolute Percentage Error (MAPE). Metode fuzzy time series menggunakan himpunan fuzzy dalam proses peramalannya sedangkan metode pemulusan eksponensial ganda dari Holt menggunakan pemulusan nilai dari serentetan data dengan cara menguranginya secara eksponensial. Dalam meramalkan jumlah mahasiswa baru Institut Pertanian Bogor, metode fuzzy time series menghasilkan tingkat ketepatan peramalan yang lebih baik dengan nilai MAPE sebesar 6.41 % dibandingkan dengan metode pemulusan eksponensial ganda dari Holt dengan nilai MAPE sebesar 7.75 %. Setelah dilakukan studi kasus, metode pemulusan eksponensial ganda dari Holt akan lebih akurat hasil peramalannya jika data yang digunakan lebih banyak.


2019 ◽  
Vol 1 (2) ◽  
pp. 193
Author(s):  
Muhammad Abdy ◽  
Rahmat Syam ◽  
Elfira Haryanensi

Abstrak. Penelitian ini merupakan penerapan metode automatic clustering-fuzzy logical relationships unruk meramalkan jumlah penduduk di Kota Makassar menggunakan data sekunder BPS Kota Makassar yang bertujuan memprediksi jumlah penduduk  tahun 2017-2021. Penelitian diawali dengan penentuan panjang interval, nilai tengah panjang interval, membuat relasi logika fuzzy, fuzzifikasi, defuzzifikasi, dan menghitung nilai error hasil ramalan dengan metode Mean Absolute Percentage Error. Hasil penelitian ini menunjukkan bahwa ramalan jumlah penduduk di Kota Makassar dari tahun 2016 ke 2017 meningkat, tahun 2017 sampai tahun 2019 menurun, dan pada tahun 2019-2021 meningkat dengan keakuratan yang sangat bagus.Kata kunci:Automatic clustering-fuzzy logical relationships, Fuzzy Time Series,TeoriFuzzyAbstract.This research is the application of the forecasting method of fuzzy time series which is the method of automatic clustering fuzzy-logical relationships in forecasting the population of Makassar City using secondary data from BPS Makassar city which aims to predicting the population in year 2017-2021. The discussion starting from the determination of the length of the interval, determining the value of the middle length interval, making relations of fuzzy logic, fuzzification, defuzzification, and calculating the error value of the forecasting result by using the method of Mean Absolute Percentage Error. The result of this research shows that the predictions of the population of Makassar City from 2016 to 2017 increased, from 2017 to 2019 decreased, and in 2019-2021 increased with the very good accuracy. Keywords:Automatic Clustering-Fuzzy Logical Relationships, Fuzzy Time Series,Fuzzy Theory


2018 ◽  
Vol 7 (1) ◽  
pp. 84-95
Author(s):  
Gayuh Kresnawati ◽  
Budi Warsito ◽  
Abdul Hoyyi

Smooth Transition Autoregressive (STAR) Model is one of time series model used in case of data that has nonlinear tendency. STAR is an expansion of Autoregressive (AR) Model and can be used if the nonlinear test is accepted. If the transition function G(st,γ,c) is logistic, the method used is Logistic Smooth Transition Autoregressive (LSTAR). Weekly IHSG data in period of 3 January 2010 until 24 December 2017 has nonlinier tend and logistic transition function so it can be modeled with LSTAR . The result of this research with significance level of 5% is the LSTAR(1,1) model. The forecast of IHSG data for the next 15 period has Mean Absolute Percentage Error (MAPE) 2,932612%. Keywords : autoregressive, LSTAR, nonlinier, time series


Author(s):  
Muhammad Wahdeni Pramana ◽  
Ika Purnamasari ◽  
Surya Prangga

Ekspor merupakan aktivitas perdagangan atau penjualan barang dari dalam negeri ke luar negeri. Ekspor nonmigas sebagai salah satu komponen pembentuk Produk Domestik Regional Bruto (PDRB) sehingga perlu adanya suatu peramalan nilai di masa mendatang. Fuzzy Time Series (FTS) merupakan metode peramalan dengan berdasarkan teori himpunan fuzzy, logika fuzzy, serta hasil peramalan yang dapat dibahasakan (linguistik). Metode Weighted Fuzzy Time Series (WFTS) Lee merupakan perluasan dari metode FTS dengan penambahan pembobotan pada tiap pola relasi yang terbentuk. Tujuan penelitian ini adalah memperoleh nilai peramalan ekspor nonmigas Provinsi Kalimantan Timur pada bulan November 2020 serta memperoleh nilai akurasi peramalan berdasarkan metode Mean Absolute Percentage Error (MAPE) dan Root Mean Square Error (RMSE). Berdasarkan hasil analisis diperoleh nilai akurasi peramalan untuk data Ekspor Nonmigas Provinsi Kalimantan Timur bulan Januari 2019 – Oktober 2020 dengan konstanta pembobot   menggunakan metode MAPE diperoleh hasil keseluruhan dibawah 10% sehingga diperoleh konstanta pembobot terbaik yaitu  dengan nilai MAPE terminimum yaitu sebesar 3,62% dan RMSE minimum sebesar 50,67. Dari hasil tersebut, diperoleh hasil peramalan untuk bulan November 2020 dengan menggunakan kontanta pembobot terbaik  yaitu sebesar 850,96 juta USD.


Sign in / Sign up

Export Citation Format

Share Document