scholarly journals The Comparison of Imputation Methods in Space Time Series Data with Missing Values

2010 ◽  
Vol 17 (2) ◽  
pp. 263-273 ◽  
Author(s):  
Sung-Duck Lee ◽  
Duck-Ki Kim
Stats ◽  
2019 ◽  
Vol 2 (4) ◽  
pp. 457-467 ◽  
Author(s):  
Hossein Hassani ◽  
Mahdi Kalantari ◽  
Zara Ghodsi

In all fields of quantitative research, analysing data with missing values is an excruciating challenge. It should be no surprise that given the fragmentary nature of fossil records, the presence of missing values in geographical databases is unavoidable. As in such studies ignoring missing values may result in biased estimations or invalid conclusions, adopting a reliable imputation method should be regarded as an essential consideration. In this study, the performance of singular spectrum analysis (SSA) based on L 1 norm was evaluated on the compiled δ 13 C data from East Africa soil carbonates, which is a world targeted historical geology data set. Results were compared with ten traditionally well-known imputation methods showing L 1 -SSA performs well in keeping the variability of the time series and providing estimations which are less affected by extreme values, suggesting the method introduced here deserves further consideration in practice.


Hydrology ◽  
2018 ◽  
Vol 5 (4) ◽  
pp. 63 ◽  
Author(s):  
Benjamin Nelsen ◽  
D. Williams ◽  
Gustavious Williams ◽  
Candace Berrett

Complete and accurate data are necessary for analyzing and understanding trends in time-series datasets; however, many of the available time-series datasets have gaps that affect the analysis, especially in the earth sciences. As most available data have missing values, researchers use various interpolation methods or ad hoc approaches to data imputation. Since the analysis based on inaccurate data can lead to inaccurate conclusions, more accurate data imputation methods can provide accurate analysis. We present a spatial-temporal data imputation method using Empirical Mode Decomposition (EMD) based on spatial correlations. We call this method EMD-spatial data imputation or EMD-SDI. Though this method is applicable to other time-series data sets, here we demonstrate the method using temperature data. The EMD algorithm decomposes data into periodic components called intrinsic mode functions (IMF) and exactly reconstructs the original signal by summing these IMFs. EMD-SDI initially decomposes the data from the target station and other stations in the region into IMFs. EMD-SDI evaluates each IMF from the target station in turn and selects the IMF from other stations in the region with periodic behavior most correlated to target IMF. EMD-SDI then replaces a section of missing data in the target station IMF with the section from the most closely correlated IMF from the regional stations. We found that EMD-SDI selects the IMFs used for reconstruction from different stations throughout the region, not necessarily the station closest in the geographic sense. EMD-SDI accurately filled data gaps from 3 months to 5 years in length in our tests and favorably compares to a simple temporal method. EMD-SDI leverages regional correlation and the fact that different stations can be subject to different periodic behaviors. In addition to data imputation, the EMD-SDI method provides IMFs that can be used to better understand regional correlations and processes.


2019 ◽  
Vol 8 (4) ◽  
pp. 418-427
Author(s):  
Eko Siswanto ◽  
Hasbi Yasin ◽  
Sudarno Sudarno

In many applications, several time series data are recorded simultaneously at a number of locations. Time series data from nearby locations often to be related by spatial and time. This data is called spatial time series data. Generalized Space Time Autoregressive (GSTAR) model is one of space time models used to modeling and forecasting spatial time series data. This study applied GTSAR model to modeling volume of rainfall four locations in Jepara Regency, Kudus Regency, Pati Regency, and Grobogan Regency. Based on the smallest RMSE mean of forecasting result, the best model chosen by this study is GSTAR (11)-I(1)12 with the inverse distance weighted. Based on GSTAR(11)-I(1)12 with the inverse distance weighted, the relationship between the location shown on rainfall Pati Regency influenced by the rainfall in other regencies. Keywords: GSTAR, RMSE, Rainfall


2018 ◽  
Vol 17 (02) ◽  
pp. 1850017 ◽  
Author(s):  
Mahdi Kalantari ◽  
Masoud Yarmohammadi ◽  
Hossein Hassani ◽  
Emmanuel Sirimal Silva

Missing values in time series data is a well-known and important problem which many researchers have studied extensively in various fields. In this paper, a new nonparametric approach for missing value imputation in time series is proposed. The main novelty of this research is applying the [Formula: see text] norm-based version of Singular Spectrum Analysis (SSA), namely [Formula: see text]-SSA which is robust against outliers. The performance of the new imputation method has been compared with many other established methods. The comparison is done by applying them to various real and simulated time series. The obtained results confirm that the SSA-based methods, especially [Formula: see text]-SSA can provide better imputation in comparison to other methods.


2018 ◽  
Vol 2 (2) ◽  
pp. 49-57
Author(s):  
Dwi Yulianti ◽  
I Made Sumertajaya ◽  
Itasia Dina Sulvianti

Generalized space time autoregressive integrated  moving average (GSTARIMA) model is a time series model of multiple variables with spatial and time linkages (space time). GSTARIMA model is an extension of the space time autoregressive integrated moving average (STARIMA) model with the assumption that each location has unique model parameters, thus GSTARIMA model is more flexible than STARIMA model. The purposes of this research are to determine the best model and predict the time series data of rice price on all provincial capitals of Sumatra island using GSTARIMA model. This research used weekly data of rice price on all provincial capitals of Sumatra island from January 2010 to December 2017. The spatial weights used in this research are the inverse distance and queen contiguity. The modeling result shows that the best model is GSTARIMA (1,1,0) with queen contiguity weighted matrix and has the smallest MAPE value of 1.17817 %.


2021 ◽  
Vol 6 (1) ◽  
pp. 1-4
Author(s):  
Bo Yuan Chang ◽  
Mohamed A. Naiel ◽  
Steven Wardell ◽  
Stan Kleinikkink ◽  
John S. Zelek

Over the past years, researchers have proposed various methods to discover causal relationships among time-series data as well as algorithms to fill in missing entries in time-series data. Little to no work has been done in combining the two strategies for the purpose of learning causal relationships using unevenly sampled multivariate time-series data. In this paper, we examine how the causal parameters learnt from unevenly sampled data (with missing entries) deviates from the parameters learnt using the evenly sampled data (without missing entries). However, to obtain the causal relationship from a given time-series requires evenly sampled data, which suggests filling the missing data values before obtaining the causal parameters. Therefore, the proposed method is based on applying a Gaussian Process Regression (GPR) model for missing data recovery, followed by several pairwise Granger causality equations in Vector Autoregssive form to fit the recovered data and obtain the causal parameters. Experimental results show that the causal parameters generated by using GPR data filling offers much lower RMSE than the dummy model (fill with last seen entry) under all missing values percentage, suggesting that GPR data filling can better preserve the causal relationships when compared with dummy data filling, thus should be considered when dealing with unevenly sampled time-series causality learning.


Author(s):  
Yonghong Luo ◽  
Ying Zhang ◽  
Xiangrui Cai ◽  
Xiaojie Yuan

The missing values, appear in most of multivariate time series, prevent advanced analysis of multivariate time series data. Existing imputation approaches try to deal with missing values by deletion, statistical imputation, machine learning based imputation and generative imputation. However, these methods are either incapable of dealing with temporal information or multi-stage. This paper proposes an end-to-end generative model E²GAN to impute missing values in multivariate time series. With the help of the discriminative loss and the squared error loss, E²GAN can impute the incomplete time series by the nearest generated complete time series at one stage. Experiments on multiple real-world datasets show that our model outperforms the baselines on the imputation accuracy and achieves state-of-the-art classification/regression results on the downstream applications. Additionally, our method also gains better time efficiency than multi-stage method on the training of neural networks.


Sign in / Sign up

Export Citation Format

Share Document