Principal component analysis in metabolomics: from multidimensional data toward biologically relevant information

Author(s):  
Renata Bujak ◽  
Michal Jan Markuszewski
2013 ◽  
Vol 2013 ◽  
pp. 1-16 ◽  
Author(s):  
Cong Liu ◽  
Xu Wei-sheng ◽  
Wu Qi-di

We propose the Tensorial Kernel Principal Component Analysis (TKPCA) for dimensionality reduction and feature extraction from tensor objects, which extends the conventional Principal Component Analysis (PCA) in two perspectives: working directly with multidimensional data (tensors) in their native state and generalizing an existing linear technique to its nonlinear version by applying the kernel trick. Our method aims to remedy the shortcomings of multilinear subspace learning (tensorial PCA) developed recently in modelling the nonlinear manifold of tensor objects and brings together the desirable properties of kernel methods and tensor decompositions for significant performance gain when the data are multidimensional and nonlinear dependencies do exist. Our approach begins by formulating TKPCA as an optimization problem. Then, we develop a kernel function based on Grassmann Manifold that can directly take tensorial representation as parameters instead of traditional vectorized representation. Furthermore, a TKPCA-based tensor object recognition is also proposed for application of the action recognition. Experiments with real action datasets show that the proposed method is insensitive to both noise and occlusion and performs well compared with state-of-the-art algorithms.


2011 ◽  
Vol 199-200 ◽  
pp. 850-857
Author(s):  
Jian Chao Dong ◽  
Tie Jun Yang ◽  
Xin Hui Li ◽  
Zhi Jun Shuai ◽  
You Hong Xiao

Principal component analysis (PCA), serving as one of the basic blind signal processing techniques, is extensively employed in all forms of analysis for extracting relevant information from confusing data sets. The principle of PCA is explained in this paper firstly, then the simulation and experiment are carried out to a simply supported beam rig, and PCA is used in frequency domain to identify sources number of several cases. Meanwhile principal components (PCs) contribution coefficient and signal to noise ratio between neighboring PCs (neighboring SNR) are introduced to cutoff minor components quantificationally. The results show that when observation number is equal to or larger than source number and additive noise is feebleness, accurate prediction of the number of uncorrelated excitation sources in a multiple input multiple output system could be obtained by principal component analysis.


Mathematics ◽  
2018 ◽  
Vol 6 (11) ◽  
pp. 269 ◽  
Author(s):  
Sergio Camiz ◽  
Valério Pillar

The identification of a reduced dimensional representation of the data is among the main issues of exploratory multidimensional data analysis and several solutions had been proposed in the literature according to the method. Principal Component Analysis (PCA) is the method that has received the largest attention thus far and several identification methods—the so-called stopping rules—have been proposed, giving very different results in practice, and some comparative study has been carried out. Some inconsistencies in the previous studies led us to try to fix the distinction between signal from noise in PCA—and its limits—and propose a new testing method. This consists in the production of simulated data according to a predefined eigenvalues structure, including zero-eigenvalues. From random populations built according to several such structures, reduced-size samples were extracted and to them different levels of random normal noise were added. This controlled introduction of noise allows a clear distinction between expected signal and noise, the latter relegated to the non-zero eigenvalues in the samples corresponding to zero ones in the population. With this new method, we tested the performance of ten different stopping rules. Of every method, for every structure and every noise, both power (the ability to correctly identify the expected dimension) and type-I error (the detection of a dimension composed only by noise) have been measured, by counting the relative frequencies in which the smallest non-zero eigenvalue in the population was recognized as signal in the samples and that in which the largest zero-eigenvalue was recognized as noise, respectively. This way, the behaviour of the examined methods is clear and their comparison/evaluation is possible. The reported results show that both the generalization of the Bartlett’s test by Rencher and the Bootstrap method by Pillar result much better than all others: both are accounted for reasonable power, decreasing with noise, and very good type-I error. Thus, more than the others, these methods deserve being adopted.


2021 ◽  
Vol 30 (30 (1)) ◽  
pp. 177-186
Author(s):  
Silviu Cornel Virgil Chiriac

The current paper is part of a wider study which aims at identifying the determining factors of the performances of the entities in the real estate field and the setting up of a composite index of the companies’ performances based on a sample of 29 companies listed at the BVB Bucharest (Bucharest Stock Exchange) in the year 2019 using one of the multidimensional data analysis techniques, the principal component analysis. The descriptive analysis, the principal component analysis for setting up the composite index of the companies performances were applied within the study in order to highlight the most important companies from the point of view of the financial performance. The descriptive analysis of the data set highlights the overview within the companies selected for analysis. The study aims at building a synthetic indicator that will show the financial performance of the companies selected based on 9 financial indicators using the principal component analysis PCA. The 9 indicators considered for the analysis were selected based on specialised articles and they are: ROA – return on assets, which reflect the company’s capacity of using its assets productively, ROE – return on equity, which measures the efficiency of use of the stockholders’ capitals, rotation of total assets, general liquidity ratio, general solvency ratio, general dent-to-equity level, net profit margin, gross return of portfolio.


2005 ◽  
Vol 57 (6) ◽  
pp. 805-810 ◽  
Author(s):  
L. Barbosa ◽  
P.S. Lopes ◽  
A.J. Regazzi ◽  
S.E.F. Guimarães ◽  
R.A. Torres

Using principal component analysis, records of 435 animals from an F2 swine population were used to identity independent and informative variables of economically important performance. The following performance traits were recorded: litter size at birth (BL), litter size at weaning (WL), teat number (TN), birth weight (BW), weight at 21 (W21), 42 (W42), 63 (W63) and 77 (W77) days of age, average daily gain (ADG), feed intake (FI) and feed:gain ratio (FGR) from 77 to 105 days of age. Six principal components expressed variation lower than 0.7 (eigen values lower than 0.7) suggesting that six variables could be discarded with little information loss. The discarded variables present significant simple linear correlation with the retained variables. Retaining variables BL, TN, W77, FI and FGR and eliminating all the rest would retain most of the relevant information in the original data set.


2020 ◽  
pp. 147592172097739
Author(s):  
Luis Eduardo Mujica ◽  
Magda Ruiz ◽  
Rodolfo Villamizar

The hydrocarbon industry in Colombia is one of the principal pillars for the Colombian economy, representing around 5% of its gross domestic product. Since petroleum reserves have decreased, gas becomes one main alternative for economical growth. However, current gas pipelines have been in service for over 30 years and some of them are buried and phenomena, such as metal losses, corrosion, mechanical stress, strikes by excavation machinery, and another type of damages, are presented. The maintenance program of these structures is typically corrective type and is very expensive. To overcome this situation, the native research institute “Research Institute of Corrosion—Corporación para la Investigación de la Corrosión” recently developed an in-line inspection tool to be operated in Colombian gas pipelines to get valuable information of their current state along thousands of kilometers. A huge quantity of data is recorded (including tool movement, magnet, magnetic flow leakage, and caliper signals), which demand a high-computational cost and an adequate tool analysis to establish the current pipeline structural health condition. In this sense, authors have shown in several works that principal component analysis is an effective tool to detect and locate abnormal operational structural conditions from multidimensional data. In a previous analysis, multidimensional data were used to locate possible damages along the pipeline. However, most of the activated points belonged to weld points. Then, in this article, it is proposed to use the root mean square value of magnetic flux leakage signals to separate these points and to obtain sets of signals by sections removing the welds, and then multiway principal component analysis is applied for each set of signals of each gas pipeline section. The maximum values of damage indices ( Q and [Formula: see text]-statistics) of each section are conserved to activate the sections of the gas pipeline with more probability of damages and then, they must be evaluated by experts.


2018 ◽  
Author(s):  
Y-h. Taguchi

AbstractDue to missed sample labeling, unsupervised feature selection during single-cell (sc) RNA-seq can identify critical genes under the experimental conditions considered. In this paper, we applied principal component analysis (PCA)-based unsupervised feature extraction (FE) to identify biologically relevant genes from mouse and human embryonic brain development expression profiles retrieved by scRNA-seq. When evaluating the biological relevance of selected genes by various enrichment analyses, the PCA-based unsupervised FE outperformed conventional unsupervised approaches that select highly variable genes as well as bimodal genes in addition to the recently proposed dpFeature.


SinkrOn ◽  
2022 ◽  
Vol 7 (1) ◽  
pp. 93-100
Author(s):  
Alfry Aristo Jansen Sinlae ◽  
Dedy Alamsyah ◽  
Lilik Suhery ◽  
Fryda Fatmayati

Palm oil is one of the leading commodities in Indonesia. Oil palm yields can be influenced by several factors, one of which is proper weed control. Uncontrolled weeds can damage oil palm plantations. To be able to manage and control weeds, especially large leaf weeds, it is necessary to know the types of weeds. However, not all farmers have knowledge about the types of weeds. For that we need a system that can help identify broadleaf weeds based on leaf images using image processing. So this study aims to build a large leaf weed classification system using a combination of the K-Nearest Neighbor (KNN) and Principal Component Analysis (PCA) algorithms. PCA is used as feature extraction based on the characteristics formed from each spatial property. PCA can be used to reduce and retain most of the relevant information from the original features according to the optimal criteria. The results of the information will then be used by KNN for learning by paying attention to the closest distance from the object. Based on the test results, the developed model is able to produce an accuracy of 90%. Principal Component Analysis (PCA) and K-Nearest Neighbor (KNN) algorithms can be used in the classification process properly. Accuracy results are strongly influenced by the amount of training data and test data as well as the quality of the image used.


Sign in / Sign up

Export Citation Format

Share Document