scholarly journals Imputation of missing data in time series by different computation methods in various data set applications

2020 ◽  
Vol 32 ◽  
pp. 03010
Author(s):  
Dhiraj Magare ◽  
Sushil Labde ◽  
Manoj Gofane ◽  
Vishwesh Vyawahare

In a modern technology generation, big volumes of data are evolved under numerous operations compared to an earlier era. However, collection of data without missing single value, is a great challenge ahead. In practice, there are many solutions suggested to avoid the missing values in time series applications. The existing methods used in imputation and their prediction with time series, varies with applications. The existing methods mostly available for imputation are least squares support vector machine (LSSVM), autoregressive integrated moving average models (ARIMA), Artificial Neural Network (ANN), Artificial Intelligence (AI) techniques, state space models, Kalman filtering and fuzzy model. The extensive experimental application data is used to analyze these methods. In addition, a synthetic set of data can also be used to forecast missing value, which improves performance of imputation methods in time series. In this paper, predominantly used imputation methods have been listed with their fundamental computational information along with their verification on set of data mentioned.

2011 ◽  
Vol 15 (6) ◽  
pp. 1835-1852 ◽  
Author(s):  
R. Samsudin ◽  
P. Saad ◽  
A. Shabri

Abstract. This paper proposes a novel hybrid forecasting model known as GLSSVM, which combines the group method of data handling (GMDH) and the least squares support vector machine (LSSVM). The GMDH is used to determine the useful input variables which work as the time series forecasting for the LSSVM model. Monthly river flow data from two stations, the Selangor and Bernam rivers in Selangor state of Peninsular Malaysia were taken into consideration in the development of this hybrid model. The performance of this model was compared with the conventional artificial neural network (ANN) models, Autoregressive Integrated Moving Average (ARIMA), GMDH and LSSVM models using the long term observations of monthly river flow discharge. The root mean square error (RMSE) and coefficient of correlation (R) are used to evaluate the models' performances. In both cases, the new hybrid model has been found to provide more accurate flow forecasts compared to the other models. The results of the comparison indicate that the new hybrid model is a useful tool and a promising new method for river flow forecasting.


Stats ◽  
2019 ◽  
Vol 2 (4) ◽  
pp. 457-467 ◽  
Author(s):  
Hossein Hassani ◽  
Mahdi Kalantari ◽  
Zara Ghodsi

In all fields of quantitative research, analysing data with missing values is an excruciating challenge. It should be no surprise that given the fragmentary nature of fossil records, the presence of missing values in geographical databases is unavoidable. As in such studies ignoring missing values may result in biased estimations or invalid conclusions, adopting a reliable imputation method should be regarded as an essential consideration. In this study, the performance of singular spectrum analysis (SSA) based on L 1 norm was evaluated on the compiled δ 13 C data from East Africa soil carbonates, which is a world targeted historical geology data set. Results were compared with ten traditionally well-known imputation methods showing L 1 -SSA performs well in keeping the variability of the time series and providing estimations which are less affected by extreme values, suggesting the method introduced here deserves further consideration in practice.


This comprehensive review provides an extensive overview of the existing Time Series Forecasting technique. This survey is not restricted to any single time series analysis; it provides forecasting of time series in different areas like marketing prediction, weather forecasting, technology prediction, financial forecasting etc. In this paper, we have analyzed forecasting in some areas namely, load forecasting, wind speed forecasting, prediction of energy consumption and short-term traffic flow prediction. Various models are available for prediction among them Autoregressive Integrated Moving Average model (ARIMA) is seen as a universal mechanism, these discussed forecasting areas utilizes different models that are combined with ARIMA. Hybrid models are the combination of classical models and modern methods, like ARIMA (classical method) combines with Artificial Neural Network (ANN) as well as with Support Vector Machine (SVM) (modern models). Hybrid model’s performance is depending on the variety of data that are taken for forecasting.


MAUSAM ◽  
2021 ◽  
Vol 68 (2) ◽  
pp. 349-356
Author(s):  
J. HAZARIKA ◽  
B. PATHAK ◽  
A. N. PATOWARY

Perceptive the rainfall pattern is tough for the solution of several regional environmental issues of water resources management, with implications for agriculture, climate change, and natural calamity such as floods and droughts. Statistical computing, modeling and forecasting data are key instruments for studying these patterns. The study of time series analysis and forecasting has become a major tool in different applications in hydrology and environmental fields. Among the most effective approaches for analyzing time series data is the ARIMA (Autoregressive Integrated Moving Average) model introduced by Box and Jenkins. In this study, an attempt has been made to use Box-Jenkins methodology to build ARIMA model for monthly rainfall data taken from Dibrugarh for the period of 1980- 2014 with a total of 420 points.  We investigated and found that ARIMA (0, 0, 0) (0, 1, 1)12 model is suitable for the given data set. As such this model can be used to forecast the pattern of monthly rainfall for the upcoming years, which can help the decision makers to establish priorities in terms of agricultural, flood, water demand management etc.  


2021 ◽  
pp. 0734242X2110614
Author(s):  
AKM Mohsin ◽  
Lei Hongzhen ◽  
Mohammed Masum Iqbal ◽  
Zahir Rayhan Salim ◽  
Alamgir Hossain ◽  
...  

Forecasting the scale of e-waste recycling is the basis for the government to formulate the development plan of circular economy and relevant subsidy policies and enterprises to evaluate resource recovery and optimise production capacity. In this article, the CH-X12 /STL-X framework for e-waste recycling scale prediction is proposed based on the idea of ‘decomposition-integration’, considering that the seasonal data characteristics of quarterly e-waste recycling scale data may lead to large forecasting errors and inconsistent forecasting results of a traditional single model. First, the seasonal data characteristics of the time series of e-waste recovery scale are identified based on Canova–Hansen (CH) test, and then the time series suitable for seasonal decomposition is extracted with X12 or seasonal-trend decomposition procedure based on loess (STL) model for seasonal components. Then, the Holt–Winters model was used to predict the seasonal component, and the support vector regression (SVR) model was used to predict the other components. Finally, the linear sum of the prediction results of each component is used to obtain the final prediction result. The empirical results show that the proposed CH-X12/STL-X forecasting framework can better meet the modelling requirements for time-series forecasting driven by different seasonal data characteristics and has better and more stable forecasting performance than traditional single models (Holt–Winters model, seasonal autoregressive integrated moving average model and SVR model).


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Pooja Rani ◽  
Rajneesh Kumar ◽  
Anurag Jain

PurposeDecision support systems developed using machine learning classifiers have become a valuable tool in predicting various diseases. However, the performance of these systems is adversely affected by the missing values in medical datasets. Imputation methods are used to predict these missing values. In this paper, a new imputation method called hybrid imputation optimized by the classifier (HIOC) is proposed to predict missing values efficiently.Design/methodology/approachThe proposed HIOC is developed by using a classifier to combine multivariate imputation by chained equations (MICE), K nearest neighbor (KNN), mean and mode imputation methods in an optimum way. Performance of HIOC has been compared to MICE, KNN, and mean and mode methods. Four classifiers support vector machine (SVM), naive Bayes (NB), random forest (RF) and decision tree (DT) have been used to evaluate the performance of imputation methods.FindingsThe results show that HIOC performed efficiently even with a high rate of missing values. It had reduced root mean square error (RMSE) up to 17.32% in the heart disease dataset and 34.73% in the breast cancer dataset. Correct prediction of missing values improved the accuracy of the classifiers in predicting diseases. It increased classification accuracy up to 18.61% in the heart disease dataset and 6.20% in the breast cancer dataset.Originality/valueThe proposed HIOC is a new hybrid imputation method that can efficiently predict missing values in any medical dataset.


Author(s):  
Baidyanath Biswas

This chapter discusses the concepts of time-series applications and forecasting in the context of information systems security. The primary objective in such formulation is the training of the models followed by efficient prediction. Although economic and financial forecasting problems extensively use time-series, predicting software vulnerabilities is a novel idea. The chapter also provides appropriate guidelines for the implementation and adaptation of univariate time-series for information security. To achieve this, the authors focus on the following techniques: autoregressive (AR), moving average (MA), autoregressive integrated moving average (ARIMA), and exponential smoothing. The analysis considers a unique data set consisting of the publicly exposed software vulnerabilities, available from the U.S. Dept. of Homeland Security. The problem is presented first, followed by a general framework to identify the problem, estimate the best-fit parameters of that model, and conclude with an illustrative example from the above dataset to familiarize readers with the business problem.


2012 ◽  
Author(s):  
Ruhaidah Samsudin ◽  
Puteh Saad ◽  
Ani Shabri

In this paper, time series prediction is considered as a problem of missing value. A model for the determination of the missing time series value is presented. The hybrid model integrating autoregressive intergrated moving average (ARIMA) and artificial neural network (ANN) model is developed to solve this problem. The developed models attempts to incorporate the linear characteristics of an ARIMA model and nonlinear patterns of ANN to create a hybrid model. In this study, time series modeling of rice yield data in Muda Irrigation area. Malaysia from 1995 to 2003 are considered. Experimental results with rice yields data sets indicate that the hybrid model improve the forecasting performance by either of the models used separately. Key words: ARIMA; Box and Jenkins; neural networks; rice yields; hybrid ANN model


2019 ◽  
Vol 9 (20) ◽  
pp. 4448 ◽  
Author(s):  
İş ◽  
Tuncer

This article considers methodological approaches to determine and prevent social media manipulation specific to Twitter. Behavioral analyses of Twitter users were performed by using their profile structures and interaction types, and Twitter users were classified according to their effect size values by determining their asset values. User profiles were classified into three different categories, namely popular-active, observer-passive, and spam-bot-malicious by using k-nearest neighbor (K-NN), support vector machine (SVM), and artificial neural network (ANN) algorithms. For classification, the study used the basic characteristics of users, such as density, centralization, and diameter, as well as suggested time series such as the simple moving average and cumulative moving average. The highest accuracy was obtained by the K-NN algorithm. The results obtained with K-NN for all classes were higher than the F1-Score values obtained for the other algorithms. According to the results obtained, classification accuracy values were found to reach a maximum of 96.81% and a minimum of 92.33%. Our classification results showed that the proposed method was satisfactory for popular-active, observer-passive, and spam-bot-malicious account separation.


Sign in / Sign up

Export Citation Format

Share Document