scholarly journals Improved singular spectrum analysis for time series with missing data

2015 ◽  
Vol 22 (4) ◽  
pp. 371-376 ◽  
Author(s):  
Y. Shen ◽  
F. Peng ◽  
B. Li

Abstract. Singular spectrum analysis (SSA) is a powerful technique for time series analysis. Based on the property that the original time series can be reproduced from its principal components, this contribution develops an improved SSA (ISSA) for processing the incomplete time series and the modified SSA (SSAM) of Schoellhamer (2001) is its special case. The approach is evaluated with the synthetic and real incomplete time series data of suspended-sediment concentration from San Francisco Bay. The result from the synthetic time series with missing data shows that the relative errors of the principal components reconstructed by ISSA are much smaller than those reconstructed by SSAM. Moreover, when the percentage of the missing data over the whole time series reaches 60 %, the improvements of relative errors are up to 19.64, 41.34, 23.27 and 50.30 % for the first four principal components, respectively. Both the mean absolute error and mean root mean squared error of the reconstructed time series by ISSA are also smaller than those by SSAM. The respective improvements are 34.45 and 33.91 % when the missing data accounts for 60 %. The results from real incomplete time series also show that the standard deviation (SD) derived by ISSA is 12.27 mg L−1, smaller than the 13.48 mg L−1 derived by SSAM.

2014 ◽  
Vol 1 (2) ◽  
pp. 1947-1966
Author(s):  
Y. Shen ◽  
F. Peng ◽  
B. Li

Abstract. Singular spectrum analysis (SSA) is a powerful technique for time series analysis. Based on the property that the original time series can be reproduced from its principal components, this contribution will develop an improved SSA (ISSA) for processing the incomplete time series and the modified SSA (SSAM) of Schoellhamer (2001) is its special case. The approach was evaluated with the synthetic and real incomplete time series data of suspended-sediment concentration from San Francisco Bay. The result from the synthetic time series with missing data shows that the relative errors of the principal components reconstructed by ISSA are much smaller than those reconstructed by SSAM. Moreover, when the percentage of the missing data over the whole time series reaches 60%, the improvements of relative errors are up to 19.64, 41.34, 23.27 and 50.30% for the first four principal components, respectively. Besides, both the mean absolute errors and mean root mean squared errors of the reconstructed time series by ISSA are also much smaller than those by SSAM. The respective improvements are 34.45 and 33.91% when the missing data accounts for 60%. The results from real incomplete time series also show that the SD derived by ISSA is 12.27 mg L−1, smaller than 13.48 mg L−1 derived by SSAM.


2020 ◽  
Author(s):  
Yunzhong Shen ◽  
Fengwei Wang ◽  
Qiujie Chen

<p>Since a time series is usually incomplete, the missing data are usually interpolated before employing singular spectrum analysis (SSA). We develop a new SSA for processing incomplete time series based on the property that an original time series can be reproduced from its principal components which are then estimated based on minimum norm criterion. When an incomplete time series is polluted by multiplicative noise, we first convert the multiplicative noise to additive noise by multiplying the signal estimate of the time series, then process the time series with weighted SSA, where the weight factor is determined according to the variance of additive noise, since the converted additive noise is heterogeneous. The proposed SSA approach is employed to process the real incomplete time series data of suspended-sediment concentration from San Francisco Bay compared to the traditional SSA and homomorphic log-transformation SSA approach. The first 10 principal components derived by our proposed SSA approach can capture more of the total variance and with less fitting error than traditional SSA approach and homomorphic log-transformation SSA approach. Furthermore, the results from the simulation cases conform that our proposed SSA outperform both traditional and homomorphic log-transformation SSA approaches.</p>


2018 ◽  
Vol 17 (02) ◽  
pp. 1850017 ◽  
Author(s):  
Mahdi Kalantari ◽  
Masoud Yarmohammadi ◽  
Hossein Hassani ◽  
Emmanuel Sirimal Silva

Missing values in time series data is a well-known and important problem which many researchers have studied extensively in various fields. In this paper, a new nonparametric approach for missing value imputation in time series is proposed. The main novelty of this research is applying the [Formula: see text] norm-based version of Singular Spectrum Analysis (SSA), namely [Formula: see text]-SSA which is robust against outliers. The performance of the new imputation method has been compared with many other established methods. The comparison is done by applying them to various real and simulated time series. The obtained results confirm that the SSA-based methods, especially [Formula: see text]-SSA can provide better imputation in comparison to other methods.


Author(s):  
S.M. Shaharudin ◽  
N. Ahmad ◽  
N.H. Zainuddin

<p>Identifying the local time scale of the torrential rainfall pattern through Singular Spectrum Analysis (SSA) is useful to separate the trend and noise components. However, SSA poses two main issues which are torrential rainfall time series data have coinciding singular values and the leading components from eigenvector obtained from the decomposing time series matrix are usually assesed by graphical inference lacking in a specific statistical measure. In consequences to both issues, the extracted trend from SSA tended to flatten out and did not show any distinct pattern.  This problem was approached in two ways. First, an Iterative Oblique SSA (Iterative O-SSA) was presented to make adjustment to the singular values data. Second, a measure was introduced to group the decomposed eigenvector based on Robust Sparse K-means (RSK-Means). As the results, the extracted trend using modification of SSA appeared to fit the original time series and looked more flexible compared to SSA.</p>


2021 ◽  
Author(s):  
Shu Kaneko ◽  
Katsumi Hattori ◽  
Toru Mogi ◽  
Chie Yoshino

&lt;p&gt;Off the coast of the Boso Peninsula, there is a triple junction of the Pacific Plate, the Philippine Sea Plate, and the North American Plate and the Boso Peninsula is one of the seismically active areas in Japan. There are also epicenter areas such as the 1703 Genroku Kanto Earthquake (M8.2), the 1923 Taisho Kanto Earthquake (M7.9), and the Boso Slow Slip which occurs every 6 years, which are geologically interesting places. To estimate the subsurface resistivity structure of the whole Boso area, Magnetotelluric (MT) survey with 41 sites (inter-sites distance of 7 km) has been conducted in 2014-2016, using U43 (12 sites, 1 Hz sampling ; Tierra Technica) and MTU-5, 5A, net (41 sites, 15, 150, and 2400 Hz sampling; Phoenix Geophysics). However, the Boso area is greatly affected by leak current from DC-driven trains, factories, and power lines, so the observed data are contaminated by artificial noises. When we tried to apply the conventional noise reduction method (e.g., remote reference (Gamble et al., 1979) and BIRRP (Chave and Thomson, 2004)) in frequency domain, the obtained MT sounding curve was not ideal. In particular, the phase between the periods of 20 and 400 sec was close to 0 degrees. It suggests that the method used is insufficient to reduce the near-field effect for the Boso data. Thus, we developed a new noise reduction method using MSSA (Multi-channel Singular Spectrum Analysis) as a pre-processing method in time domain.&lt;/p&gt;&lt;p&gt;The procedure is as follows;&lt;/p&gt;&lt;p&gt;(1) Decompose 6 component data (Hx, Hy, Ex, Ey, Hxr and Hyr: H and E means magnetic and electric field, respectively, x and y indicates NS and EW component, and r denotes the reference field observed at a quiet station) using MSSA into 6&amp;#215;M principal components (PCs). &amp;#160;Here, M shows the window length of MSSA.&lt;/p&gt;&lt;p&gt;(2) Check contribution and periods of each PC and eliminate the PCs which are corresponding to the longer periods of variation. That is &amp;#8220;detrend&amp;#8221; of the original data.&lt;/p&gt;&lt;p&gt;(3) Apply the second MSSA to the detrended time series data to separate signals and noises shorter than 400 sec.&lt;/p&gt;&lt;p&gt;(4) Calculating correlation coefficients between H and Hr and between E and Hr for each PC and select the PCs with higher correlation to reconstruct time series data to make MT analysis.&lt;/p&gt;&lt;p&gt;Then, we perform MT analysis by BIRRP to estimate apparent resistivity,&lt;/p&gt;&lt;p&gt;As a result, the coherences of H-Hr, and E-Hr were improved and the MT sounding curve became smoother than those results by the conventional noise reduction methods. This indicated that the effectiveness of the proposed noise reduction. However, further investigation in different periods and sites will be required.&lt;/p&gt;


Author(s):  
Hamid Reza Ghafarian Malamiri ◽  
Iman Rousta ◽  
Haraldur Olafsson ◽  
Hadi Zare ◽  
Hao Zhang

Land Surface Temperature (LST) is a basic parameter in energy exchange between the land and atmosphere and is frequently used in many sciences such as climatology, hydrology, agriculture, ecology, etc. LST time series data have usually deficient, missing and unacceptable data caused by the presence of clouds in images, presence of dust in atmosphere and sensor failure. In this study, Singular Spectrum Analysis (SSA) algorithm was used to resolve the problem of missing and outlier data caused by cloud cover. The region studied in the present research included an image frame of MODIS with horizontal number 22 and vertical number 05 (h22v05). This image involved a large part of Iran and Turkmenistan and Caspian Sea. In this study, MODIS LST sensor (MOD11A1) was used during 2015 with 1&times;1 Km spatial resolution and day/night LST data (daily temporal resolution). The results of the data quality showed that cloud cover caused 36.37% of missing data in the studied time series with 730 day/night LST images. Further, the results of SSA algorithm in reconstruction of LST images indicated the Root Mean Square Error (RMSE) of 2.95 K between the original and reconstructed data in LST time series in the study region. In general, the findings showed that SSA algorithm using spatio-temporal interpolation in LST time series can be effectively used to resolve the problem of missing data caused by cloud cover.


Entropy ◽  
2020 ◽  
Vol 22 (1) ◽  
pp. 83 ◽  
Author(s):  
Paulo Canas Rodrigues ◽  
Jonatha Pimentel ◽  
Patrick Messala ◽  
Mohammad Kazemi

Singular spectrum analysis (SSA) is a non-parametric method that breaks down a time series into a set of components that can be interpreted and grouped as trend, periodicity, and noise, emphasizing the separability of the underlying components and separate periodicities that occur at different time scales. The original time series can be recovered by summing all components. However, only the components associated to the signal should be considered for the reconstruction of the noise-free time series and to conduct forecasts. When the time series data has the presence of outliers, SSA and other classic parametric and non-parametric methods might result in misleading conclusions and robust methodologies should be used. In this paper we consider the use of two robust SSA algorithms for model fit and one for model forecasting. The classic SSA model, the robust SSA alternatives, and the autoregressive integrated moving average (ARIMA) model are compared in terms of computational time and accuracy for model fit and model forecast, using a simulation example and time series data from the quotas and returns of six mutual investment funds. When outliers are present in the data, the simulation study shows that the robust SSA algorithms outperform the classical ARIMA and SSA models.


Sign in / Sign up

Export Citation Format

Share Document