scholarly journals Robust Bilinear Probabilistic Principal Component Analysis

Algorithms ◽  
2021 ◽  
Vol 14 (11) ◽  
pp. 322
Author(s):  
Yaohang Lu ◽  
Zhongming Teng

Principal component analysis (PCA) is one of the most popular tools in multivariate exploratory data analysis. Its probabilistic version (PPCA) based on the maximum likelihood procedure provides a probabilistic manner to implement dimension reduction. Recently, the bilinear PPCA (BPPCA) model, which assumes that the noise terms follow matrix variate Gaussian distributions, has been introduced to directly deal with two-dimensional (2-D) data for preserving the matrix structure of 2-D data, such as images, and avoiding the curse of dimensionality. However, Gaussian distributions are not always available in real-life applications which may contain outliers within data sets. In order to make BPPCA robust for outliers, in this paper, we propose a robust BPPCA model under the assumption of matrix variate t distributions for the noise terms. The alternating expectation conditional maximization (AECM) algorithm is used to estimate the model parameters. Numerical examples on several synthetic and publicly available data sets are presented to demonstrate the superiority of our proposed model in feature extraction, classification and outlier detection.

Author(s):  
Andrew J. Connolly ◽  
Jacob T. VanderPlas ◽  
Alexander Gray ◽  
Andrew J. Connolly ◽  
Jacob T. VanderPlas ◽  
...  

With the dramatic increase in data available from a new generation of astronomical telescopes and instruments, many analyses must address the question of the complexity as well as size of the data set. This chapter deals with how we can learn which measurements, properties, or combinations thereof carry the most information within a data set. It describes techniques that are related to concepts discussed when describing Gaussian distributions, density estimation, and the concepts of information content. The chapter begins with an exploration of the problems posed by high-dimensional data. It then describes the data sets used in this chapter, and introduces perhaps the most important and widely used dimensionality reduction technique, principal component analysis (PCA). The remainder of the chapter discusses several alternative techniques which address some of the weaknesses of PCA.


Author(s):  
Petr Praus

In this chapter the principals and applications of principal component analysis (PCA) applied on hydrological data are presented. Four case studies showed the possibility of PCA to obtain information about wastewater treatment process, drinking water quality in a city network and to find similarities in the data sets of ground water quality results and water-related images. In the first case study, the composition of raw and cleaned wastewater was characterised and its temporal changes were displayed. In the second case study, drinking water samples were divided into clusters in consistency with their sampling localities. In the case study III, the similar samples of ground water were recognised by the calculation of cosine similarity, the Euclidean and Manhattan distances. In the case study IV, 32 water-related images were transformed into a large image matrix whose dimensionality was reduced by PCA. The images were clustered using the PCA scatter plots.


Complexity ◽  
2019 ◽  
Vol 2019 ◽  
pp. 1-13 ◽  
Author(s):  
Yue Hu ◽  
Jin-Xing Liu ◽  
Ying-Lian Gao ◽  
Sheng-Jun Li ◽  
Juan Wang

In the big data era, sequencing technology has produced a large number of biological sequencing data. Different views of the cancer genome data provide sufficient complementary information to explore genetic activity. The identification of differentially expressed genes from multiview cancer gene data is of great importance in cancer diagnosis and treatment. In this paper, we propose a novel method for identifying differentially expressed genes based on tensor robust principal component analysis (TRPCA), which extends the matrix method to the processing of multiway data. To identify differentially expressed genes, the plan is carried out as follows. First, multiview data containing cancer gene expression data from different sources are prepared. Second, the original tensor is decomposed into a sum of a low-rank tensor and a sparse tensor using TRPCA. Third, the differentially expressed genes are considered to be sparse perturbed signals and then identified based on the sparse tensor. Fourth, the differentially expressed genes are evaluated using Gene Ontology and Gene Cards tools. The validity of the TRPCA method was tested using two sets of multiview data. The experimental results showed that our method is superior to the representative methods in efficiency and accuracy aspects.


Sign in / Sign up

Export Citation Format

Share Document