Missing Data Methods: Time-Series Methods and Applications

2020 ◽

Vol 20 (4) ◽

pp. 916-930

Author(s):

Andrew Q. Philips

Keyword(s):

Time Series ◽

Missing Data ◽

Time Series Data ◽

Series Data ◽

Duration Dependence ◽

Cross Sectional ◽

Common Solution ◽

The Common

In cross-sectional time-series data with a dichotomous dependent variable, failing to account for duration dependence when it exists can lead to faulty inferences. A common solution is to include duration dummies, polynomials, or splines to proxy for duration dependence. Because creating these is not easy for the common practitioner, I introduce a new command, mkduration, that is a straightforward way to generate a duration variable for binary cross-sectional time-series data in Stata. mkduration can handle various forms of missing data and allows the duration variable to easily be turned into common parametric and nonparametric approximations.

Download Full-text

A Method Based on MTLS and ILSP for GNSS Coordinate Time Series Analysis with Missing Data

Advances in Space Research ◽

10.1016/j.asr.2021.06.037 ◽

2021 ◽

Author(s):

Yingying Ren ◽

Hu Wang ◽

Lizhen Lian ◽

Jiexian Wang ◽

Yingyan Cheng ◽

...

Keyword(s):

Time Series ◽

Missing Data ◽

Time Series Analysis ◽

Coordinate Time ◽

Coordinate Time Series ◽

Series Analysis

Download Full-text

A comparison of methods for smoothing and gap filling time series of remote sensing observations – application to MODIS LAI products

Biogeosciences ◽

10.5194/bg-10-4055-2013 ◽

2013 ◽

Vol 10 (6) ◽

pp. 4055-4071 ◽

Cited By ~ 84

Author(s):

S. Kandasamy ◽

F. Baret ◽

A. Verger ◽

P. Neveux ◽

M. Weiss

Keyword(s):

Time Series ◽

Missing Data ◽

Time Course ◽

Singular Spectrum Analysis ◽

Gaussian Function ◽

Gap Filling ◽

Area Index ◽

Filling Time ◽

Moderate Resolution ◽

Temporal Profiles

Abstract. Moderate resolution satellite sensors including MODIS (Moderate Resolution Imaging Spectroradiometer) already provide more than 10 yr of observations well suited to describe and understand the dynamics of earth's surface. However, these time series are associated with significant uncertainties and incomplete because of cloud cover. This study compares eight methods designed to improve the continuity by filling gaps and consistency by smoothing the time course. It includes methods exploiting the time series as a whole (iterative caterpillar singular spectrum analysis (ICSSA), empirical mode decomposition (EMD), low pass filtering (LPF) and Whittaker smoother (Whit)) as well as methods working on limited temporal windows of a few weeks to few months (adaptive Savitzky–Golay filter (SGF), temporal smoothing and gap filling (TSGF), and asymmetric Gaussian function (AGF)), in addition to the simple climatological LAI yearly profile (Clim). Methods were applied to the MODIS leaf area index product for the period 2000–2008 and over 25 sites showed a large range of seasonal patterns. Performances were discussed with emphasis on the balance achieved by each method between accuracy and roughness depending on the fraction of missing observations and the length of the gaps. Results demonstrate that the EMD, LPF and AGF methods were failing because of a significant fraction of gaps (more than 20%), while ICSSA, Whit and SGF were always providing estimates for dates with missing data. TSGF (Clim) was able to fill more than 50% of the gaps for sites with more than 60% (80%) fraction of gaps. However, investigation of the accuracy of the reconstructed values shows that it degrades rapidly for sites with more than 20% missing data, particularly for ICSSA, Whit and SGF. In these conditions, TSGF provides the best performances that are significantly better than the simple Clim for gaps shorter than about 100 days. The roughness of the reconstructed temporal profiles shows large differences between the various methods, with a decrease of the roughness with the fraction of missing data, except for ICSSA. TSGF provides the smoothest temporal profiles for sites with a % gap > 30%. Conversely, ICSSA, LPF, Whit, AGF and Clim provide smoother profiles than TSGF for sites with a % gap < 30%. Impact of the accuracy and smoothness of the reconstructed time series were evaluated on the timing of phenological stages. The dates of start, maximum and end of the season are estimated with an accuracy of about 10 days for the sites with a % gap < 10% and increases rapidly with the % gap. TSGF provides more accurate estimates of phenological timing up to a % gap < 60%.

Download Full-text

AN EVOLUTIONARY APPROACH FOR IMPUTING MISSING DATA IN TIME SERIES

Journal of Circuits System and Computers ◽

10.1142/s0218126610006050 ◽

2010 ◽

Vol 19 (01) ◽

pp. 107-121 ◽

Cited By ~ 7

Author(s):

JUAN CARLOS FIGUEROA GARCÍA ◽

DUSKO KALENATIC ◽

CESAR AMILCAR LÓPEZ BELLO

Keyword(s):

Genetic Algorithm ◽

Time Series ◽

Missing Data ◽

Genetic Structure ◽

Evolutionary Algorithm ◽

Fitness Function ◽

Error Function ◽

Missing Observations ◽

Mean And Variance ◽

Methodological Aspects

This paper presents a proposal based on an evolutionary algorithm for imputing missing observations in time series. A genetic algorithm based on the minimization of an error function derived from their autocorrelation function, mean, and variance is presented. All methodological aspects of the genetic structure are presented. An extended description of the design of the fitness function is provided. Four application examples are provided and solved by using the proposed method.

Download Full-text

Predicting in multivariate incomplete time series. Application of the expectation-maximisation algorithm supplemented by the Newton-Raphson method

Przegląd Statystyczny ◽

10.5604/01.3001.0015.0376 ◽

2021 ◽

Vol 68 (1) ◽

pp. 17-46

Author(s):

Adam Korczyński

Keyword(s):

Time Series ◽

Missing Data ◽

Em Algorithm ◽

Measurement Errors ◽

Parameters Estimation ◽

Maximum Likelihood Estimates ◽

Original Algorithm ◽

Expectation Maximisation ◽

Newton Raphson ◽

Raphson Method

Statistical practice requires various imperfections resulting from the nature of data to be addressed. Data containing different types of measurement errors and irregularities, such as missing observations, have to be modelled. The study presented in the paper concerns the application of the expectation-maximisation (EM) algorithm to calculate maximum likelihood estimates, using an autoregressive model as an example. The model allows describing a process observed only through measurements with certain level of precision and through more than one data series. The studied series are affected by a measurement error and interrupted in some time periods, which causes the information for parameters estimation and later for prediction to be less precise. The presented technique aims to compensate for missing data in time series. The missing data appear in the form of breaks in the source of the signal. The adjustment has been performed by the EM algorithm to a hybrid version, supplemented by the Newton-Raphson method. This technique allows the estimation of more complex models. The formulation of the substantive model of an autoregressive process affected by noise is outlined, as well as the adjustment introduced to overcome the issue of missing data. The extended version of the algorithm has been verified using sampled data from a model serving as an example for the examined process. The verification demonstrated that the joint EM and Newton-Raphson algorithms converged with a relatively small number of iterations and resulted in the restoration of the information lost due to missing data, providing more accurate predictions than the original algorithm. The study also features an example of the application of the supplemented algorithm to some empirical data (in the calculation of a forecasted demand for newspapers).

Download Full-text

Imputing missing data in non-renewable empower time series from night-time lights observations

Ecological Indicators ◽

10.1016/j.ecolind.2017.08.040 ◽

2018 ◽

Vol 84 ◽

pp. 106-118 ◽

Cited By ~ 2

Author(s):

Laura Neri ◽

Luca Coscieme ◽

Biagio F. Giannetti ◽

Federico M. Pulselli

Keyword(s):

Time Series ◽

Missing Data ◽

Night Time

Download Full-text

Relationship Between Missing Data Likelihoods and Complete Data Restricted Likelihoods for Regression Time Series Models: An Application to Total Ozone Data

Journal of the Royal Statistical Society Series C (Applied Statistics) ◽

10.2307/2986223 ◽

1996 ◽

Vol 45 (1) ◽

pp. 63 ◽

Cited By ~ 5

Author(s):

Sabyasachi Basu ◽

Gregory C. Reinsel

Keyword(s):

Time Series ◽

Missing Data ◽

Total Ozone ◽

Complete Data ◽

Time Series Models ◽

Ozone Data

Download Full-text

Estimating Nonstationary Time Series with Missing Data

European Consortium for Mathematics in Industry - Proceedings of the Sixth European Conference on Mathematics in Industry August 27–31, 1991 Limerick ◽

10.1007/978-3-663-09834-8_16 ◽

1992 ◽

pp. 103-106

Author(s):

D. Čepar ◽

Z. Radalj ◽

B. Vovk

Keyword(s):

Time Series ◽

Missing Data ◽

Nonstationary Time Series

Download Full-text

Environmental Time Series Prediction with Missing Data by Machine Learning and Dynamics Recostruction

Pattern Recognition. ICPR International Workshops and Challenges - Lecture Notes in Computer Science ◽

10.1007/978-3-030-68780-9_3 ◽

2021 ◽

pp. 26-33

Author(s):

Francesco Camastra ◽

Vincenzo Capone ◽

Angelo Ciaramella ◽

Tony Christian Landi ◽

Angelo Riccio ◽

...

Keyword(s):

Machine Learning ◽

Time Series ◽

Missing Data ◽

Time Series Prediction ◽

Environmental Time Series

Download Full-text

Extracting Common Mode Errors of Regional GNSS Position Time Series in the Presence of Missing Data by Variational Bayesian Principal Component Analysis

Sensors ◽

10.3390/s20082298 ◽

2020 ◽

Vol 20 (8) ◽

pp. 2298 ◽

Cited By ~ 2

Author(s):

Wudong Li ◽

Weiping Jiang ◽

Zhao Li ◽

Hua Chen ◽

Qusen Chen ◽

...

Keyword(s):

Principal Component Analysis ◽

Time Series ◽

Missing Data ◽

Principal Component ◽

Component Analysis ◽

Common Mode ◽

Variational Bayesian ◽

North East ◽

Position Time Series ◽

Gnss Position Time Series

Removal of the common mode error (CME) is very important for the investigation of global navigation satellite systems’ (GNSS) error and the estimation of an accurate GNSS velocity field for geodynamic applications. The commonly used spatiotemporal filtering methods normally process the evenly spaced time series without missing data. In this article, we present the variational Bayesian principal component analysis (VBPCA) to estimate and extract CME from the incomplete GNSS position time series. The VBPCA method can naturally handle missing data in the Bayesian framework and utilizes the variational expectation-maximization iterative algorithm to search each principal subspace. Moreover, it could automatically select the optimal number of principal components for data reconstruction and avoid the overfitting problem. To evaluate the performance of the VBPCA algorithm for extracting CME, 44 continuous GNSS stations located in Southern California were selected. Compared to previous approaches, VBPCA could achieve better performance with lower CME relative errors when more missing data exists. Since the first principal component (PC) extracted by VBPCA is remarkably larger than the other components, and its corresponding spatial response presents nearly uniform distribution, we only use the first PC and its eigenvector to reconstruct the CME for each station. After filtering out CME, the interstation correlation coefficients are significantly reduced from 0.43, 0.46, and 0.38 to 0.11, 0.10, and 0.08, for the north, east, and up (NEU) components, respectively. The root mean square (RMS) values of the residual time series and the colored noise amplitudes for the NEU components are also greatly suppressed, with average reductions of 27.11%, 28.15%, and 23.28% for the former, and 49.90%, 54.56%, and 49.75% for the latter. Moreover, the velocity estimates are more reliable and precise after removing CME, with average uncertainty reductions of 51.95%, 57.31%, and 49.92% for the NEU components, respectively. All these results indicate that the VBPCA method is an alternative and efficient way to extract CME from regional GNSS position time series in the presence of missing data. Further work is still required to consider the effect of formal errors on the CME extraction during the VBPCA implementation.

Download Full-text

Missing Data Methods: Time-Series Methods and Applications

An easy way to create duration variables in binary cross-sectional time-series data

A Method Based on MTLS and ILSP for GNSS Coordinate Time Series Analysis with Missing Data

A comparison of methods for smoothing and gap filling time series of remote sensing observations – application to MODIS LAI products

AN EVOLUTIONARY APPROACH FOR IMPUTING MISSING DATA IN TIME SERIES

Predicting in multivariate incomplete time series. Application of the expectation-maximisation algorithm supplemented by the Newton-Raphson method

Imputing missing data in non-renewable empower time series from night-time lights observations

Relationship Between Missing Data Likelihoods and Complete Data Restricted Likelihoods for Regression Time Series Models: An Application to Total Ozone Data

Estimating Nonstationary Time Series with Missing Data

Environmental Time Series Prediction with Missing Data by Machine Learning and Dynamics Recostruction

Extracting Common Mode Errors of Regional GNSS Position Time Series in the Presence of Missing Data by Variational Bayesian Principal Component Analysis

Export Citation Format