On order determination by predictor augmentation

Biometrika ◽  
2020 ◽  
Author(s):  
Wei Luo ◽  
Bing Li

Summary In many dimension reduction problems in statistics and machine learning, such as in principal component analysis, canonical correlation analysis, independent component analysis and sufficient dimension reduction, it is important to determine the dimension of the reduced predictor, which often amounts to estimating the rank of a matrix. This problem is called order determination. In this article, we propose a novel and highly effective order-determination method based on the idea of predictor augmentation. We show that if the predictor is augmented by an artificially generated random vector, then the parts of the eigenvectors of the matrix induced by the augmentation display a pattern that reveals information about the order to be determined. This information, when combined with the information provided by the eigenvalues of the matrix, greatly enhances the accuracy of order determination.

Complexity ◽  
2019 ◽  
Vol 2019 ◽  
pp. 1-13 ◽  
Author(s):  
Yue Hu ◽  
Jin-Xing Liu ◽  
Ying-Lian Gao ◽  
Sheng-Jun Li ◽  
Juan Wang

In the big data era, sequencing technology has produced a large number of biological sequencing data. Different views of the cancer genome data provide sufficient complementary information to explore genetic activity. The identification of differentially expressed genes from multiview cancer gene data is of great importance in cancer diagnosis and treatment. In this paper, we propose a novel method for identifying differentially expressed genes based on tensor robust principal component analysis (TRPCA), which extends the matrix method to the processing of multiway data. To identify differentially expressed genes, the plan is carried out as follows. First, multiview data containing cancer gene expression data from different sources are prepared. Second, the original tensor is decomposed into a sum of a low-rank tensor and a sparse tensor using TRPCA. Third, the differentially expressed genes are considered to be sparse perturbed signals and then identified based on the sparse tensor. Fourth, the differentially expressed genes are evaluated using Gene Ontology and Gene Cards tools. The validity of the TRPCA method was tested using two sets of multiview data. The experimental results showed that our method is superior to the representative methods in efficiency and accuracy aspects.


2013 ◽  
Vol 303-306 ◽  
pp. 1101-1104 ◽  
Author(s):  
Yong De Hu ◽  
Jing Chang Pan ◽  
Xin Tan

Kernel entropy component analysis (KECA) reveals the original data’s structure by kernel matrix. This structure is related to the Renyi entropy of the data. KECA maintains the invariance of the original data’s structure by keeping the data’s Renyi entropy unchanged. This paper described the original data by several components on the purpose of dimension reduction. Then the KECA was applied in celestial spectra reduction and was compared with Principal Component Analysis (PCA) and Kernel Principal Component Analysis (KPCA) by experiments. Experimental results show that the KECA is a good method in high-dimensional data reduction.


2015 ◽  
Vol 738-739 ◽  
pp. 643-647
Author(s):  
Qi Zhu ◽  
Jin Rong Cui ◽  
Zi Zhu Fan

In this paper, a matrix based feature extraction and measurement method, i.e.: multi-column principle component analysis (MCPCA) is used to directly and effectively extract features from the matrix. We analyze the advantages of MCPCA over the conventional principal component analysis (PCA) and two-dimensional PCA (2DPCA), and we have successfully applied it into face image recognition. Extensive face recognition experiments illustrate that the proposed method obtains high accuracy, and it is more robust than previous conventional face recognition methods.


2020 ◽  
Vol 23 ◽  
pp. 41-44
Author(s):  
Oļegs Užga-Rebrovs ◽  
Gaļina Kuļešova

Any data in an implicit form contain information of interest to the researcher. The purpose of data analysis is to extract this information. The original data may contain redundant elements and noise, distorting these data to one degree or another. Therefore, it seems necessary to subject the data to preliminary processing. Reducing the dimension of the initial data makes it possible to remove interfering factors and present the data in a form suitable for further analysis. The paper considers an approach to reducing the dimensionality of the original data based on principal component analysis.


Author(s):  
Chi Qiao ◽  
Andrew T. Myers

Abstract Surrogate modeling of the variability of metocean conditions in space and in time during hurricanes is a crucial task for risk analysis on offshore structures such as offshore wind turbines, which are deployed over a large area. This task is challenging because of the complex nature of the meteorology-metocean interaction in addition to the time-dependence and high-dimensionality of the output. In this paper, spatio-temporal characteristics of surrogate models, such as Deep Neural Networks, are analyzed based on an offshore multi-hazard database created by the authors. The focus of this paper is two-fold: first, the effectiveness of dimension reduction techniques for representing high-dimensional output distributed in space is investigated and, second, an overall approach to estimate spatio-temporal characteristics of hurricane hazards using Deep Neural Networks is presented. The popular dimension reduction technique, Principal Component Analysis, is shown to perform similarly compared to a simpler dimension reduction approach and to not perform as well as a surrogate model implemented without dimension reduction. Discussions are provided to explain why the performance of Principal Component Analysis is only mediocre in this implementation and why dimension reduction might not be necessary.


2020 ◽  
Vol 17 (4) ◽  
pp. 172988141989688
Author(s):  
Liming Li ◽  
Jing Zhao ◽  
Chunrong Wang ◽  
Chaojie Yan

The multivariate statistical method such as principal component analysis based on linear dimension reduction and kernel principal component analysis based on nonlinear dimension reduction as the modified principal component analysis method are commonly used. Because of the diversity and correlation of robotic global performance indexes, the two multivariate statistical methods principal component analysis and kernel principal component analysis methods can be used, respectively, to comprehensively evaluate the global performance of PUMA560 robot with different dimensions. When using the kernel principal component analysis method, the kernel function and parameters directly have an effect on the result of comprehensive performance evaluation. Because kernel principal component analysis with polynomial kernel function is time-consuming and inefficient, a new kernel function based on similarity degree is proposed for the big sample data. The new kernel function is proved according to Mercer’s theorem. By comparing different dimension reduction effects of principal component analysis method, the kernel principal component analysis method with polynomial kernel function, and the kernel principal component analysis method with the new kernel function, the kernel principal component analysis method with the new kernel function could deal more effectively with the nonlinear relationship among indexes, and its calculation result is more reasonable for containing more comprehensive information. The simulation shows that the kernel principal component analysis method with the new kernel function has the advantage of low time consuming, good real-time performance, and good ability of generalization.


Sign in / Sign up

Export Citation Format

Share Document