scholarly journals PRINCIPAL COMPONENT ANALYSIS AND ITS GENERALIZATIONS FOR ANY TYPE SEQUENCE (PCA-SEQ)

2019 ◽  
Author(s):  
V.M. Efimov ◽  
K.V. Efimov ◽  
V.Y. Kovaleva

In the 40s of the last century, Karhunen and Loève proposed a method for processing of one-dimensional numeric time series by converting it into a multidimensional by shifts. In fact, a one-dimensional number series was decomposed into several orthogonal time series. This method has many times been independently developed and applied in practice under various names (EOF, SSA, Caterpillar, etc.). Nowadays, the name SSA (the Singular Spectral Analysis) is most often used. It turned out that it is universal, applicable to any time series without requiring stationary assumptions, automatically decomposes time series into a trend, cyclic components and noise. By the beginning of the 1980s Takens showed that for a dynamical system such a method makes it possible to obtain an attractor from observing only one of these variables, thereby bringing the method to a powerful theoretical basis. In the same years, the practical benefits of phase portraits became clear. In particular, it was used in the analysis and forecast of the animal abundance dynamics.In this paper we propose to extend SSA to one-dimensional sequence of any type elements, including numbers, symbols, figures, etc., and, as a special case, to molecular sequence. Technically, the problem is solved almost the same algorithm as the SSA. The sequence is cut by a sliding window into fragments of a given length. Between all fragments, the matrix of Euclidean distances is calculated. This is always possible. For example, the square root from the Hamming distance between fragments is the Euclidean distance. For the resulting matrix, the principal components are calculated by the principal-coordinate method (PCo). Instead of a distance matrix one can use a matrix of any similarity/dissimilarity indexes and apply methods of multidimensional scaling (MDS). The result will always be PCs in some Euclidean space.We called this method PCA-Seq. It is certainly an exploratory method, as its particular case SSA. For any sequence, including molecular, PCA-Seq without any additional assumptions allows to get its principal components in a numerical form and visualize them in the form of phase portraits. Long-term experience of SSA application for numerical data gives all reasons to believe that PCA-Seq will be not less useful in the analysis of non-numerical data, especially in hypothesizing.PCA-Seq is implemented in the freely distributed Jacobi 4 package (http://mrherrn.github.io/JACOBI4/).

2020 ◽  
Vol 23 (8) ◽  
pp. 1032-1036 ◽  
Author(s):  
V. M. Efimov ◽  
K. V. Efimov ◽  
V. Y. Kovaleva

In the 1940s, Karhunen and Loève proposed a method for processing a one-dimensional numeric time series by converting it into multidimensional by shifts. In fact, a one-dimensional number series was decomposed into several orthogonal time series. This method has many times been independently developed and applied in practice under various names (EOF, SSA, Caterpillar, etc.). Nowadays, the name ‘SSA’ (Singular Spectral Analysis) is the most often used. It turned out that it is universal, applicable to any time series without requiring stationary assumptions, automatically decomposes time series into a trend, cyclic components and noise. By the beginning of the 1980s, Takens had shown that for a dynamical system such a method makes it possible to obtain an attractor from observing only one of these variables, thereby bringing the method to a powerful theoretical basis. In the same years, the practical benefits of phase portraits became clear. In particular, it was used in the analysis and forecast of animal abundance dynamics. In this paper we propose to extend SSA to a one-dimensional sequence of any type of elements, including numbers, symbols, figures, etc., and, as a special case, to a molecular sequence. Technically, the problem is solved using an algorithm like SSA. The sequence is cut by a sliding window into fragments of a given length. Between all fragments, the matrix of Euclidean distances is calculated. This is always possible. For example, the square root of the Hamming distance between fragments is a Euclidean distance. For the resulting matrix, the principal components are calculated by the principal-coordinate method (PCo). Instead of a distance matrix, one can use a matrix of any similarity/dissimilarity indexes and apply methods of multidimensional scaling (MDS). The result will always be PCs in some Euclidean space. We called this method ‘PCA-Seq’. It is certainly an exploratory method, as is its particular case SSA. For any sequence, in cluding molecular, PCA-Seq without any additional assumptions allows presenting its principal components in a numerical form and visualizing them in the form of phase portraits. A long history of SSA application for numerical data gives all reason to believe that PCA-Seq will be not less useful in the analysis of non-numerical data, especially in hypothesizing. PCA-Seq is implemented in the freely distributed Jacobi 4 package (http://jacobi4.ru/).


2011 ◽  
Vol 103 ◽  
pp. 274-278 ◽  
Author(s):  
Ling Li Jiang ◽  
Zong Qun Deng ◽  
Si Wen Tang

This paper proposes a kernel principal component analysis (KPCA)-based denoising method for removing the noise from vibration signal. Firstly, one-dimensional time series is expanded to multidimensional time series by the phase space reconstruction method. Then, KPCA is performed on the multidimensional time series. The first kernel principal component is the denoised signal. A rolling bearing denoising example verify the effectiveness of the proposed method


2021 ◽  
Vol 29 (3) ◽  
pp. 39-51
Author(s):  
Valentas Gružauskas ◽  
Dalia Čalnerytė ◽  
Tautvydas Fyleris ◽  
Andrius Kriščiūnas

Abstract The socio-economic development of municipalities is defined by a set of indicators in a period of interest and can be analyzed as a multivariate time series. It is important to know which municipalities have similar socio-economic development trends when recommendations for policy makers are provided or datasets for real estate and insurance price evaluations are expanded. Usually, key indicators are derived from expert experience, however this publication implements a statistical approach to identify key trends. Unsupervised machine learning was performed by employing K-means clusterization and principal component analysis for a dataset of multivariate time series. After 100 runs, the result with minimal summing error was analyzed as the final clusterization. The dataset represented various socio-economic indicators in municipalities of Lithuania in the period from 2006 to 2018. The significant differences were noticed for the indicators of municipalities in the cluster which contained the 4 largest cities of Lithuania, and another one containing 3 districts of the 3 largest cities. A robust approach is proposed in this article, when identifying socio-economic differences between regions where real estate is allocated. For example, the evaluated distance matrix can be used for adjustment coefficients when applying the comparative method for real estate valuation.


2020 ◽  
Author(s):  
Hamed Nili ◽  
Alessio Basti ◽  
Olaf Hauk ◽  
Laura Marzetti ◽  
Richard Henson

The estimation of functional connectivity between regions of the brain, for example based on statistical dependencies between the time series of activity in each region, has become increasingly important in neuroimaging. Typically, multiple time series (e.g. from each voxel in fMRI data) are first reduced to a single time series that summarises the activity in a region of interest, e.g. by averaging across voxels or by taking the first principal component; an approach we call one-dimensional connectivity. However, this summary approach ignores potential multi-dimensional connectivity between two regions, and a number of recent methods have been proposed to capture such complex dependencies. Here we review the most common multi-dimensional connectivity methods, from an intuitive perspective, from a formal (mathematical) point of view, and through a number of simulated and real (fMRI and MEG) data examples that illustrate the strengths and weaknesses of each method. The paper is accompanied with both functions and scripts, which implement each method and reproduce all the examples.


2016 ◽  
Vol 77 (1) ◽  
pp. 165-178 ◽  
Author(s):  
Tenko Raykov ◽  
George A. Marcoulides ◽  
Tenglong Li

The measurement error in principal components extracted from a set of fallible measures is discussed and evaluated. It is shown that as long as one or more measures in a given set of observed variables contains error of measurement, so also does any principal component obtained from the set. The error variance in any principal component is shown to be (a) bounded from below by the smallest error variance in a variable from the analyzed set and (b) bounded from above by the largest error variance in a variable from that set. In the case of a unidimensional set of analyzed measures, it is pointed out that the reliability and criterion validity of any principal component are bounded from above by these respective coefficients of the optimal linear combination with maximal reliability and criterion validity (for a criterion unrelated to the error terms in the individual measures). The discussed psychometric features of principal components are illustrated on a numerical data set.


2020 ◽  
Author(s):  
Hamed Nili ◽  
Alessio Basti ◽  
Olaf Hauk ◽  
Laura Marzetti ◽  
Richard Henson

The estimation of functional connectivity between regions of the brain, for example based on statistical dependencies between the time series of activity in each region, has become increasingly important in neuroimaging. Typically, multiple time series (e.g. from each voxel in fMRI data) are first reduced to a single time series that summarises the activity in a region of interest, e.g. by averaging across voxels or by taking the first principal component; an approach we call one-dimensional connectivity. However, this summary approach ignores potential multi-dimensional connectivity between two regions, and a number of recent methods have been proposed to capture such complex dependencies. Here we review the most common multi-dimensional connectivity methods, from an intuitive perspective, from a formal (mathematical) point of view, and through a number of simulated and real (fMRI and MEG) data examples that illustrate the strengths and weaknesses of each method. The paper is accompanied with both functions and scripts, which implement each method and reproduce all the examples.


2019 ◽  
Vol 11 (18) ◽  
pp. 2161 ◽  
Author(s):  
Dong Peng ◽  
Ting Pan ◽  
Wen Yang ◽  
Heng-Chao Li

In this paper, we present a novel method for change-pattern mining in Synthetic Aperture Radar (SAR) image time series based on a distance matrix clustering algorithm, called K-Matrix. As it is different from the state-of-the-art methods, which analyze the SAR image time series based on the change detection matrix (CDM), here, we directly use the distance matrix to determine changed pixels and extract change patterns. The proposed scheme involves two steps: change detection in SAR image time series and change-pattern discovery. First, these distance matrices are constructed for each spatial position over the time series by a dissimilarity measurement. The changed pixels are detected by using a thresholding algorithm on the energy feature map of all distance matrices. Then, according to the change detection results in SAR image time series, the changed areas for pattern mining are determined. Finally, the proposed K-Matrix algorithm which clusters distance matrices by the matrix cross-correlation similarity is used to group all changed pixels into different change patterns. Experimental results on two datasets of TerraSAR-X image time series illustrate the effectiveness of the proposed method.


Author(s):  
Michael H. Haischer ◽  
John Krzyszkowski ◽  
Stuart Roche ◽  
Kristof Kipp

Maximal strength is important for the performance of dynamic athletic activities, such as countermovement jumps (CMJ). Although measures of maximal strength appear related to discrete CMJ variables, such as peak ground reaction forces (GRF) and center-of-mass (COM) velocity, knowledge about the association between strength and the time series patterns during CMJ will help characterize changes that can be expected in dynamic movement with changes in maximal strength. Purpose: To investigate the associations between maximal strength and GRF and COM velocity patterns during CMJ. Methods: Nineteen female college lacrosse players performed 3 maximal-effort CMJs and isometric midthigh pull. GRF and COM velocity time series data from the CMJ were time normalized and used as inputs to principal-components analyses. Associations between isometric midthigh pull peak force and CMJ principal-component scores were investigated with a correlational analysis. Results: Isometric midthigh pull peak force was associated with several GRF and COM velocity patterns. Correlations indicated that stronger players exhibited a GRF pattern characterized by greater eccentric-phase rate of force development, greater peak GRF, and a unimodal GRF profile (P = .016). Furthermore, stronger athletes exhibited a COM velocity pattern characterized by higher velocities during the concentric phase (P = .004). Conclusions: Maximal strength is correlated to specific GRF and COM velocity patterns during CMJ in female college lacrosse athletes. Since maximal strength was not correlated with discrete CMJ variables, the patterns extracted via principal-components analyses may provide information that is more beneficial for performance coaches and researchers.


Mathematics ◽  
2020 ◽  
Vol 8 (11) ◽  
pp. 2085
Author(s):  
Christian Acal ◽  
Ana M. Aguilera ◽  
Manuel Escabias

Functional Principal Component Analysis (FPCA) is an important dimension reduction technique to interpret the main modes of functional data variation in terms of a small set of uncorrelated variables. The principal components can not always be simply interpreted and rotation is one of the main solutions to improve the interpretation. In this paper, two new functional Varimax rotation approaches are introduced. They are based on the equivalence between FPCA of basis expansion of the sample curves and Principal Component Analysis (PCA) of a transformation of the matrix of basis coefficients. The first approach consists of a rotation of the eigenvectors that preserves the orthogonality between the eigenfunctions but the rotated principal component scores are not uncorrelated. The second approach is based on rotation of the loadings of the standardized principal component scores that provides uncorrelated rotated scores but non-orthogonal eigenfunctions. A simulation study and an application with data from the curves of infections by COVID-19 pandemic in Spain are developed to study the performance of these methods by comparing the results with other existing approaches.


2020 ◽  
Vol 8 (2) ◽  
pp. 346-358
Author(s):  
Alberto Oliveira da Silva ◽  
Adelaide Freitas

The extraction of essential features of any real-valued time series is crucial for exploring, modeling and producing, for example, forecasts. Taking advantage of the representation of a time series data by its trajectory matrix of Hankel constructed using Singular Spectrum Analysis, as well as of its decomposition through Principal Component Analysis via Partial Least Squares, we implement a graphical display employing the biplot methodology. A diversity of types of biplots can be constructed depending on the two matrices considered in the factorization of the trajectory matrix. In this work, we discuss the called HJ-biplot which yields a simultaneous representation of both rows and columns of the matrix with maximum quality. Interpretation of this type of biplot on Hankel related trajectory matrices is discussed from a real-world data set.


Sign in / Sign up

Export Citation Format

Share Document