scholarly journals Time-series Imputation Algorithm

Author(s):  
David Howe

Statistical imputation is a field of study that attempts to fill missing data. It is commonly applied to population statistics whose data have no correlation with running time. For a time series, data is typically analyzed using the autocorrelation function (ACF), the Fourier transform to estimate power spectral densities (PSD), the Allan deviation (ADEV), trend extensions, and basically any analysis that depends on uniform time indexes. We explain the rationale for an imputation algorithm that fills gaps in a time series by applying a backward, inverted replica of adjacent live data. To illustrate, four intentional massive gaps that exceed 100% of the original time series are recovered. The L(f) PSD with imputation applied to the gaps is nearly indistinguishable from the original. Also, the confidence of ADEV with imputation falls within 90% of the original ADEV with mixtures of power-law noises. The algorithm in Python is included for those wishing to try it.

2021 ◽  
Vol 13 (2) ◽  
pp. 542
Author(s):  
Tarate Suryakant Bajirao ◽  
Pravendra Kumar ◽  
Manish Kumar ◽  
Ahmed Elbeltagi ◽  
Alban Kuriqi

Estimating sediment flow rate from a drainage area plays an essential role in better watershed planning and management. In this study, the validity of simple and wavelet-coupled Artificial Intelligence (AI) models was analyzed for daily Suspended Sediment (SSC) estimation of highly dynamic Koyna River basin of India. Simple AI models such as the Artificial Neural Network (ANN) and Adaptive Neuro-Fuzzy Inference System (ANFIS) were developed by supplying the original time series data as an input without pre-processing through a Wavelet (W) transform. The hybrid wavelet-coupled W-ANN and W-ANFIS models were developed by supplying the decomposed time series sub-signals using Discrete Wavelet Transform (DWT). In total, three mother wavelets, namely Haar, Daubechies, and Coiflets were employed to decompose original time series data into different multi-frequency sub-signals at an appropriate decomposition level. Quantitative and qualitative performance evaluation criteria were used to select the best model for daily SSC estimation. The reliability of the developed models was also assessed using uncertainty analysis. Finally, it was revealed that the data pre-processing using wavelet transform improves the model’s predictive efficiency and reliability significantly. In this study, it was observed that the performance of the Coiflet wavelet-coupled ANFIS model is superior to other models and can be applied for daily SSC estimation of the highly dynamic rivers. As per sensitivity analysis, previous one-day SSC (St-1) is the most crucial input variable for daily SSC estimation of the Koyna River basin.


2013 ◽  
Vol 2013 ◽  
pp. 1-7 ◽  
Author(s):  
Jingpei Dan ◽  
Weiren Shi ◽  
Fangyan Dong ◽  
Kaoru Hirota

A time series representation, piecewise trend approximation (PTA), is proposed to improve efficiency of time series data mining in high dimensional large databases. PTA represents time series in concise form while retaining main trends in original time series; the dimensionality of original data is therefore reduced, and the key features are maintained. Different from the representations that based on original data space, PTA transforms original data space into the feature space of ratio between any two consecutive data points in original time series, of which sign and magnitude indicate changing direction and degree of local trend, respectively. Based on the ratio-based feature space, segmentation is performed such that each two conjoint segments have different trends, and then the piecewise segments are approximated by the ratios between the first and last points within the segments. To validate the proposed PTA, it is compared with classical time series representations PAA and APCA on two classical datasets by applying the commonly used K-NN classification algorithm. For ControlChart dataset, PTA outperforms them by 3.55% and 2.33% higher classification accuracy and 8.94% and 7.07% higher for Mixed-BagShapes dataset, respectively. It is indicated that the proposed PTA is effective for high dimensional time series data mining.


1999 ◽  
Vol 87 (2) ◽  
pp. 530-537 ◽  
Author(s):  
Lynn J. Groome ◽  
Donna M. Mooney ◽  
Scherri B. Holland ◽  
Lisa A. Smith ◽  
Jana L. Atterbury ◽  
...  

Approximate entropy (ApEn) is a statistic that quantifies regularity in time series data, and this parameter has several features that make it attractive for analyzing physiological systems. In this study, ApEn was used to detect nonlinearities in the heart rate (HR) patterns of 12 low-risk human fetuses between 38 and 40 wk of gestation. The fetal cardiac electrical signal was sampled at a rate of 1,024 Hz by using Ag-AgCl electrodes positioned across the mother’s abdomen, and fetal R waves were extracted by using adaptive signal processing techniques. To test for nonlinearity, ApEn for the original HR time series was compared with ApEn for three dynamic models: temporally uncorrelated noise, linearly correlated noise, and linearly correlated noise with nonlinear distortion. Each model had the same mean and SD in HR as the original time series, and one model also preserved the Fourier power spectrum. We estimated that noise accounted for 17.2–44.5% of the total between-fetus variance in ApEn. Nevertheless, ApEn for the original time series data still differed significantly from ApEn for the three dynamic models for both group comparisons and individual fetuses. We concluded that the HR time series, in low-risk human fetuses, could not be modeled as temporally uncorrelated noise, linearly correlated noise, or static filtering of linearly correlated noise.


2021 ◽  
Author(s):  
Matthew H. Graham ◽  
Shikhar Singh

Crises and disasters give voters an opportunity to observe the incumbent's response and reward or punish them for successes and failures. Yet even when voters agree on the facts, they tend to attribute responsibility in a group-serving manner, disproportionately crediting their party for positive developments and blaming opponents for negative developments. Using original time series data, we show that partisan disagreement over U.S. President Donald Trump's responsibility for the COVID-19 pandemic quickly emerged alongside the pandemic's onset in March 2020. Three original survey experiments show that the valence of information about the country's performance against the virus contributes causally to such gaps. A Bayesian model of information processing anticipates our findings more closely than do theories of partisan-motivated reasoning. These findings shed new light on the foundations of partisan loyalty, especially among citizens who do not think of themselves as partisans.


Author(s):  
Youseop Shin

Chapter Two defines important concepts and explains the structure of time series data. Then, it explains the univariate time series modeling procedure, such as how to visually inspect a time series; how to transform an original time series when its variance is not constant; how to estimate seasonal patterns and trends; how to obtain residuals; how to estimate the systematic pattern of residuals; and how to test the randomness of residuals.


2015 ◽  
Vol 25 (12) ◽  
pp. 1550168 ◽  
Author(s):  
Yoshito Hirata ◽  
Motomasa Komuro ◽  
Shunsuke Horai ◽  
Kazuyuki Aihara

It is practically known that a recurrence plot, a two-dimensional visualization of time series data, can contain almost all information related to the underlying dynamics except for its spatial scale because we can recover a rough shape for the original time series from the recurrence plot even if the original time series is multivariate. We here provide a mathematical proof that the metric defined by a recurrence plot [Hirata et al., 2008] is equivalent to the Euclidean metric under mild conditions.


2020 ◽  
Author(s):  
Hiroki Ogawa ◽  
Yuki Hama ◽  
Koichi Asamori ◽  
Takumi Ueda

Abstract In the magnetotelluric (MT) method, the responses of the natural electromagnetic fields are evaluated by transforming time-series data into spectral data and calculating the apparent resistivity and phase. The continuous wavelet transform (CWT) can be an alternative to the short-time Fourier transform, and the applicability of CWT to MT data has been reported. There are, however, few cases of considering the effect of numerical errors derived from spectral transform on MT data processing. In general, it is desirable to adopt a window function narrow in the time domain for higher-frequency components and one in the frequency domain for lower-frequency components. In conducting the short-time Fourier transform, because the size of the window function is fixed unless the time-series data are decimated, there might be difference between the calculated MT responses and the true ones due to the numerical errors. Meanwhile, CWT can strike a balance between the resolution of the time and frequency domains by magnifying or reducing the wavelet, according to the value of frequency. Although the types of wavelet functions and their parameters influence the resolution of time and frequency, those calculation settings of CWT are often determined empirically. In this study, focusing on the frequency band between 0.001 Hz and 10 Hz, we demonstrated the superiority of utilizing CWT in MT data processing and determined its proper calculation settings in terms of restraining the numerical errors caused by the spectral transform of time-series data. The results obtained with the short-time Fourier transform accompanied with gradual decimation of the time-series data, called cascade decimation, were compared with those of CWT. The shape of the wavelet was changed by using different types of wavelet functions or their parameters, and the respective results of data processing were compared. Through these experiments, this study indicates that CWT with the complex Morlet function with its wavelet parameter k set to 6 ≤ k < 10 will be effective in restraining the numerical errors caused by the spectral transform.


Author(s):  
Parvathi Chundi ◽  
Daniel J. Rosenkrantz

Time series data is usually generated by measuring and monitoring applications, and accounts for a large fraction of the data available for analysis purposes. A time series is typically a sequence of values that represent the state of a variable over time. Each value of the variable might be a simple value, or might have a composite structure, such as a vector of values. Time series data can be collected about natural phenomena, such as the amount of rainfall in a geographical region, or about a human activity, such as the number of shares of GoogleTM stock sold each day. Time series data is typically used for predicting future behavior from historical performance. However, a time series often needs further processing to discover the structure and properties of the recorded variable, thereby facilitating the understanding of past behavior and prediction of future behavior. Segmentation of a given time series is often used to compactly represent the time series (Gionis & Mannila, 2005), to reduce noise, and to serve as a high-level representation of the data (Das, Lin, Mannila, Renganathan & Smyth, 1998; Keogh & Kasetty, 2003). Data mining of a segmentation of a time series, rather than the original time series itself, has been used to facilitate discovering structure in the data, and finding various kinds of information, such as abrupt changes in the model underlying the time series (Duncan & Bryant, 1996; Keogh & Kasetty, 2003), event detection (Guralnik & Srivastava, 1999), etc. The rest of this chapter is organized as follows. The section on Background gives an overview of the time series segmentation problem and solutions. This section is followed by a Main Focus section where details of the tasks involved in segmenting a given time series and a few sample applications are discussed. Then, the Future Trends section presents some of the current research trends in time series segmentation and the Conclusion section concludes the chapter. Several important terms and their definitions are also included at the end of the chapter.


Sign in / Sign up

Export Citation Format

Share Document