scholarly journals Sequential projection pursuit principal component analysis – dealing with missing data associated with new -omics technologies

BioTechniques ◽  
2013 ◽  
Vol 54 (3) ◽  
Author(s):  
Bobbie-Jo M. Webb-Robertson ◽  
Melissa M. Matzke ◽  
Thomas O. Metz ◽  
Jason E. McDermott ◽  
Hyunjoo Walker ◽  
...  
Sensors ◽  
2020 ◽  
Vol 20 (8) ◽  
pp. 2298 ◽  
Author(s):  
Wudong Li ◽  
Weiping Jiang ◽  
Zhao Li ◽  
Hua Chen ◽  
Qusen Chen ◽  
...  

Removal of the common mode error (CME) is very important for the investigation of global navigation satellite systems’ (GNSS) error and the estimation of an accurate GNSS velocity field for geodynamic applications. The commonly used spatiotemporal filtering methods normally process the evenly spaced time series without missing data. In this article, we present the variational Bayesian principal component analysis (VBPCA) to estimate and extract CME from the incomplete GNSS position time series. The VBPCA method can naturally handle missing data in the Bayesian framework and utilizes the variational expectation-maximization iterative algorithm to search each principal subspace. Moreover, it could automatically select the optimal number of principal components for data reconstruction and avoid the overfitting problem. To evaluate the performance of the VBPCA algorithm for extracting CME, 44 continuous GNSS stations located in Southern California were selected. Compared to previous approaches, VBPCA could achieve better performance with lower CME relative errors when more missing data exists. Since the first principal component (PC) extracted by VBPCA is remarkably larger than the other components, and its corresponding spatial response presents nearly uniform distribution, we only use the first PC and its eigenvector to reconstruct the CME for each station. After filtering out CME, the interstation correlation coefficients are significantly reduced from 0.43, 0.46, and 0.38 to 0.11, 0.10, and 0.08, for the north, east, and up (NEU) components, respectively. The root mean square (RMS) values of the residual time series and the colored noise amplitudes for the NEU components are also greatly suppressed, with average reductions of 27.11%, 28.15%, and 23.28% for the former, and 49.90%, 54.56%, and 49.75% for the latter. Moreover, the velocity estimates are more reliable and precise after removing CME, with average uncertainty reductions of 51.95%, 57.31%, and 49.92% for the NEU components, respectively. All these results indicate that the VBPCA method is an alternative and efficient way to extract CME from regional GNSS position time series in the presence of missing data. Further work is still required to consider the effect of formal errors on the CME extraction during the VBPCA implementation.


2013 ◽  
Vol 397-400 ◽  
pp. 42-46
Author(s):  
Nan Zhao ◽  
Hong Yu Shao

According to the current situations of the unorganized and disorderly design knowledge as well as the weak innovation capability for SMEs under cloud manufacturing environment, and aiming at combining the design knowledge into ordered knowledge resource series, the service ability assessment model of knowledge resource was eventually proposed, and moreover, the Projection Pursuit-Principal Component Analysis (PP-PCA) algorithm for service ability assessment was further designed. The study in this paper would contribute to the realization of the effectiveness and accuracy of the knowledge push service, which exhibited a significant importance for improving the reuse efficiency of knowledge resources and knowledge service satisfaction under the cloud manufacturing environment.


2020 ◽  
Author(s):  
Y-h. Taguchi ◽  
Turki Turki

ABSTRACTIdentifying differentially expressed genes is difficult because of the small number of available samples compared with the large number of genes. Conventional gene selection methods employing statistical tests have the critical problem of heavy dependence of P-values on sample size. Although the recently proposed principal component analysis (PCA) and tensor decomposition (TD)-based unsupervised feature extraction (FE) has often outperformed these statistical test-based methods, the reason why they worked so well is unclear. In this study, we aim to understand this reason in the context of projection pursuit that was proposed a long time ago to solve the problem of dimensions; we can relate the space spanned by singular value vectors with that spanned by the optimal cluster centroids obtained from K-means. Thus, the success of PCA- and TD-based unsupervised FE can be understood by this equivalence. In addition to this, empirical threshold adjusted P-values of 0.01 assuming the null hypothesis that singular value vectors attributed to genes obey the Gaussian distribution empirically corresponds to threshold-adjusted P-values of 0.1 when the null distribution is generated by gene order shuffling. These findings thus rationalize the success of PCA- and TD-based unsupervised FE for the first time.


1993 ◽  
Vol 7 (6) ◽  
pp. 527-541 ◽  
Author(s):  
Yu-Long Xie ◽  
Ji-Hong Wang ◽  
Yi-Zeng Liang ◽  
Li-Xian Sun ◽  
Xin-Hua Song ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document