The Comparison of Imputation Methods in Space Time Series Data with Missing Values

In all fields of quantitative research, analysing data with missing values is an excruciating challenge. It should be no surprise that given the fragmentary nature of fossil records, the presence of missing values in geographical databases is unavoidable. As in such studies ignoring missing values may result in biased estimations or invalid conclusions, adopting a reliable imputation method should be regarded as an essential consideration. In this study, the performance of singular spectrum analysis (SSA) based on L 1 norm was evaluated on the compiled δ 13 C data from East Africa soil carbonates, which is a world targeted historical geology data set. Results were compared with ten traditionally well-known imputation methods showing L 1 -SSA performs well in keeping the variability of the time series and providing estimations which are less affected by extreme values, suggesting the method introduced here deserves further consideration in practice.

Download Full-text

The Comparison of Imputation Methods in Time Series Data with Missing Values

Communications for Statistical Applications and Methods ◽

10.5351/ckss.2009.16.4.723 ◽

2009 ◽

Vol 16 (4) ◽

pp. 723-730

Author(s):

Sung-Duck Lee ◽

Jae-Hyuk Choi ◽

Duck-Ki Kim

Keyword(s):

Time Series ◽

Missing Values ◽

Time Series Data ◽

Series Data ◽

Imputation Methods

Download Full-text

An Empirical Mode-Spatial Model for Environmental Data Imputation

Hydrology ◽

10.3390/hydrology5040063 ◽

2018 ◽

Vol 5 (4) ◽

pp. 63 ◽

Cited By ~ 1

Author(s):

Benjamin Nelsen ◽

D. Williams ◽

Gustavious Williams ◽

Candace Berrett

Keyword(s):

Time Series ◽

Spatial Data ◽

Missing Values ◽

Time Series Data ◽

Environmental Data ◽

Series Data ◽

Data Imputation ◽

Accurate Data ◽

Target Station ◽

Periodic Components

Complete and accurate data are necessary for analyzing and understanding trends in time-series datasets; however, many of the available time-series datasets have gaps that affect the analysis, especially in the earth sciences. As most available data have missing values, researchers use various interpolation methods or ad hoc approaches to data imputation. Since the analysis based on inaccurate data can lead to inaccurate conclusions, more accurate data imputation methods can provide accurate analysis. We present a spatial-temporal data imputation method using Empirical Mode Decomposition (EMD) based on spatial correlations. We call this method EMD-spatial data imputation or EMD-SDI. Though this method is applicable to other time-series data sets, here we demonstrate the method using temperature data. The EMD algorithm decomposes data into periodic components called intrinsic mode functions (IMF) and exactly reconstructs the original signal by summing these IMFs. EMD-SDI initially decomposes the data from the target station and other stations in the region into IMFs. EMD-SDI evaluates each IMF from the target station in turn and selects the IMF from other stations in the region with periodic behavior most correlated to target IMF. EMD-SDI then replaces a section of missing data in the target station IMF with the section from the most closely correlated IMF from the regional stations. We found that EMD-SDI selects the IMFs used for reconstruction from different stations throughout the region, not necessarily the station closest in the geographic sense. EMD-SDI accurately filled data gaps from 3 months to 5 years in length in our tests and favorably compares to a simple temporal method. EMD-SDI leverages regional correlation and the fact that different stations can be subject to different periodic behaviors. In addition to data imputation, the EMD-SDI method provides IMFs that can be used to better understand regional correlations and processes.

Download Full-text

PEMODELAN GENERALIZED SPACE TIME AUTOREGRESSIVE (GSTAR) SEASONAL PADA DATA CURAH HUJAN EMPAT KABUPATEN DI PROVINSI JAWA TENGAH

Jurnal Gaussian ◽

10.14710/j.gauss.v8i4.26722 ◽

2019 ◽

Vol 8 (4) ◽

pp. 418-427

Author(s):

Eko Siswanto ◽

Hasbi Yasin ◽

Sudarno Sudarno

Keyword(s):

Time Series ◽

Time Series Data ◽

Space Time ◽

Series Data ◽

Inverse Distance Weighted ◽

Modeling And Forecasting ◽

Distance Weighted ◽

The Relationship ◽

Inverse Distance

In many applications, several time series data are recorded simultaneously at a number of locations. Time series data from nearby locations often to be related by spatial and time. This data is called spatial time series data. Generalized Space Time Autoregressive (GSTAR) model is one of space time models used to modeling and forecasting spatial time series data. This study applied GTSAR model to modeling volume of rainfall four locations in Jepara Regency, Kudus Regency, Pati Regency, and Grobogan Regency. Based on the smallest RMSE mean of forecasting result, the best model chosen by this study is GSTAR (11)-I(1)12 with the inverse distance weighted. Based on GSTAR(11)-I(1)12 with the inverse distance weighted, the relationship between the location shown on rainfall Pati Regency influenced by the rainfall in other regencies. Keywords: GSTAR, RMSE, Rainfall

Download Full-text

Time Series Imputation via L1 Norm-Based Singular Spectrum Analysis

Fluctuation and Noise Letters ◽

10.1142/s0219477518500177 ◽

2018 ◽

Vol 17 (02) ◽

pp. 1850017 ◽

Cited By ~ 3

Author(s):

Mahdi Kalantari ◽

Masoud Yarmohammadi ◽

Hossein Hassani ◽

Emmanuel Sirimal Silva

Keyword(s):

Time Series ◽

Spectrum Analysis ◽

Missing Values ◽

Time Series Data ◽

Singular Spectrum Analysis ◽

Series Data ◽

L1 Norm ◽

Nonparametric Approach ◽

Singular Spectrum ◽

Simulated Time

Missing values in time series data is a well-known and important problem which many researchers have studied extensively in various fields. In this paper, a new nonparametric approach for missing value imputation in time series is proposed. The main novelty of this research is applying the [Formula: see text] norm-based version of Singular Spectrum Analysis (SSA), namely [Formula: see text]-SSA which is robust against outliers. The performance of the new imputation method has been compared with many other established methods. The comparison is done by applying them to various real and simulated time series. The obtained results confirm that the SSA-based methods, especially [Formula: see text]-SSA can provide better imputation in comparison to other methods.

Download Full-text

Filling Missing Values on Wearable-Sensory Time Series Data

Proceedings of the 2020 SIAM International Conference on Data Mining ◽

10.1137/1.9781611976236.6 ◽

2020 ◽

pp. 46-54

Author(s):

Suwen Lin ◽

Xian Wu ◽

Gonzalo Martinez ◽

Nitesh V. Chawla

Keyword(s):

Time Series ◽

Missing Values ◽

Time Series Data ◽

Series Data

Download Full-text

A missing values imputation method for time series data: an efficient method to investigate the health effects of sulphur dioxide levels

Environmetrics ◽

10.1002/env.990 ◽

2009 ◽

pp. n/a-n/a ◽

Cited By ~ 1

Author(s):

Swarna Weerasinghe

Keyword(s):

Time Series ◽

Health Effects ◽

Efficient Method ◽

Sulphur Dioxide ◽

Missing Values ◽

Time Series Data ◽

Imputation Method ◽

Series Data

Download Full-text

Pemodelan Harga Beras di Pulau Sumatera dengan Menggunakan Model Generalized Space Time ARIMA

Xplore Journal of Statistics ◽

10.29244/xplore.v2i2.105 ◽

2018 ◽

Vol 2 (2) ◽

pp. 49-57

Author(s):

Dwi Yulianti ◽

I Made Sumertajaya ◽

Itasia Dina Sulvianti

Keyword(s):

Time Series ◽

Time Series Data ◽

Moving Average ◽

Space Time ◽

Series Data ◽

Model Parameters ◽

Autoregressive Integrated Moving Average ◽

Sumatra Island ◽

Rice Price ◽

Multiple Variables

Generalized space time autoregressive integrated moving average (GSTARIMA) model is a time series model of multiple variables with spatial and time linkages (space time). GSTARIMA model is an extension of the space time autoregressive integrated moving average (STARIMA) model with the assumption that each location has unique model parameters, thus GSTARIMA model is more flexible than STARIMA model. The purposes of this research are to determine the best model and predict the time series data of rice price on all provincial capitals of Sumatra island using GSTARIMA model. This research used weekly data of rice price on all provincial capitals of Sumatra island from January 2010 to December 2017. The spatial weights used in this research are the inverse distance and queen contiguity. The modeling result shows that the best model is GSTARIMA (1,1,0) with queen contiguity weighted matrix and has the smallest MAPE value of 1.17817 %.

Download Full-text

Time-Series Causality with Missing Data

Journal of Computational Vision and Imaging Systems ◽

10.15353/jcvis.v6i1.3552 ◽

2021 ◽

Vol 6 (1) ◽

pp. 1-4

Author(s):

Bo Yuan Chang ◽

Mohamed A. Naiel ◽

Steven Wardell ◽

Stan Kleinikkink ◽

John S. Zelek

Keyword(s):

Time Series ◽

Missing Data ◽

Missing Values ◽

Time Series Data ◽

Multivariate Time Series ◽

Gaussian Process Regression ◽

Series Data ◽

Causal Relationships ◽

Sampled Data ◽

The Past

Over the past years, researchers have proposed various methods to discover causal relationships among time-series data as well as algorithms to fill in missing entries in time-series data. Little to no work has been done in combining the two strategies for the purpose of learning causal relationships using unevenly sampled multivariate time-series data. In this paper, we examine how the causal parameters learnt from unevenly sampled data (with missing entries) deviates from the parameters learnt using the evenly sampled data (without missing entries). However, to obtain the causal relationship from a given time-series requires evenly sampled data, which suggests filling the missing data values before obtaining the causal parameters. Therefore, the proposed method is based on applying a Gaussian Process Regression (GPR) model for missing data recovery, followed by several pairwise Granger causality equations in Vector Autoregssive form to fit the recovered data and obtain the causal parameters. Experimental results show that the causal parameters generated by using GPR data filling offers much lower RMSE than the dummy model (fill with last seen entry) under all missing values percentage, suggesting that GPR data filling can better preserve the causal relationships when compared with dummy data filling, thus should be considered when dealing with unevenly sampled time-series causality learning.

Download Full-text

E²GAN: End-to-End Generative Adversarial Network for Multivariate Time Series Imputation

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/429 ◽

2019 ◽

Cited By ~ 2

Author(s):

Yonghong Luo ◽

Ying Zhang ◽

Xiangrui Cai ◽

Xiaojie Yuan

Keyword(s):

Time Series ◽

Missing Values ◽

Time Series Data ◽

Multivariate Time Series ◽

Imputation Accuracy ◽

Series Data ◽

Generative Adversarial Network ◽

Multi Stage ◽

Advanced Analysis ◽

End To End

The missing values, appear in most of multivariate time series, prevent advanced analysis of multivariate time series data. Existing imputation approaches try to deal with missing values by deletion, statistical imputation, machine learning based imputation and generative imputation. However, these methods are either incapable of dealing with temporal information or multi-stage. This paper proposes an end-to-end generative model E²GAN to impute missing values in multivariate time series. With the help of the discriminative loss and the squared error loss, E²GAN can impute the incomplete time series by the nearest generated complete time series at one stage. Experiments on multiple real-world datasets show that our model outperforms the baselines on the imputation accuracy and achieves state-of-the-art classification/regression results on the downstream applications. Additionally, our method also gains better time efficiency than multi-stage method on the training of neural networks.

Download Full-text