scholarly journals Independent component analysis recovers consistent regulatory signals from disparate datasets

2021 ◽  
Vol 17 (2) ◽  
pp. e1008647 ◽  
Author(s):  
Anand V. Sastry ◽  
Alyssa Hu ◽  
David Heckmann ◽  
Saugat Poudel ◽  
Erol Kavvas ◽  
...  

The availability of bacterial transcriptomes has dramatically increased in recent years. This data deluge could result in detailed inference of underlying regulatory networks, but the diversity of experimental platforms and protocols introduces critical biases that could hinder scalable analysis of existing data. Here, we show that the underlying structure of the E. coli transcriptome, as determined by Independent Component Analysis (ICA), is conserved across multiple independent datasets, including both RNA-seq and microarray datasets. We subsequently combined five transcriptomics datasets into a large compendium containing over 800 expression profiles and discovered that its underlying ICA-based structure was still comparable to that of the individual datasets. With this understanding, we expanded our analysis to over 3,000 E. coli expression profiles and predicted three high-impact regulons that respond to oxidative stress, anaerobiosis, and antibiotic treatment. ICA thus enables deep analysis of disparate data to uncover new insights that were not visible in the individual datasets.

Author(s):  
Anand V. Sastry ◽  
Alyssa Hu ◽  
David Heckmann ◽  
Saugat Poudel ◽  
Erol Kavvas ◽  
...  

AbstractThe availability of gene expression data has dramatically increased in recent years. This data deluge could result in detailed inference of underlying regulatory networks, but the diversity of experimental platforms and protocols introduces critical biases that could hinder scalable analysis of existing data. Here, we show that the underlying structure of the E. coli transcriptome, as determined by Independent Component Analysis (ICA), is conserved across multiple independent datasets, including both RNA-seq and microarray datasets. We also show that echoes of this structure remain in the proteome, accelerating biological discovery through multi-omics analysis. We subsequently combined five transcriptomics datasets into a large compendium containing over 800 expression profiles and discovered that its underlying ICA-based structure was still comparable to that of the individual datasets. ICA thus enables deep analysis of disparate data to uncover new insights that were not visible in the individual datasets.


2011 ◽  
Vol 2011 ◽  
pp. 1-9 ◽  
Author(s):  
Tom Eichele ◽  
Srinivas Rachakonda ◽  
Brage Brakedal ◽  
Rune Eikeland ◽  
Vince D. Calhoun

Independent component analysis (ICA) is a powerful method for source separation and has been used for decomposition of EEG, MRI, and concurrent EEG-fMRI data. ICA is not naturally suited to draw group inferences since it is a non-trivial problem to identify and order components across individuals. One solution to this problem is to create aggregate data containing observations from all subjects, estimate a single set of components and then back-reconstruct this in the individual data. Here, we describe such a group-level temporal ICA model for event related EEG. When used for EEG time series analysis, the accuracy of component detection and back-reconstruction with a group model is dependent on the degree of intra- and interindividual time and phase-locking of event related EEG processes. We illustrate this dependency in a group analysis of hybrid data consisting of three simulated event-related sources with varying degrees of latency jitter and variable topographies. Reconstruction accuracy was tested for temporal jitter 1, 2 and 3 times the FWHM of the sources for a number of algorithms. The results indicate that group ICA is adequate for decomposition of single trials with physiological jitter, and reconstructs event related sources with high accuracy.


PLoS ONE ◽  
2017 ◽  
Vol 12 (7) ◽  
pp. e0181195 ◽  
Author(s):  
Moysés Nascimento ◽  
Fabyano Fonseca e Silva ◽  
Thelma Sáfadi ◽  
Ana Carolina Campana Nascimento ◽  
Talles Eduardo Maciel Ferreira ◽  
...  

2008 ◽  
Vol 128 (5) ◽  
pp. 735-741
Author(s):  
Masaomi Yanagida ◽  
Atsushi Ishigame ◽  
Atsuhiro Koyama ◽  
Nobuo Umeda ◽  
Kiyoshi Yoshida

2019 ◽  
Vol 12 (1) ◽  
Author(s):  
Petr V. Nazarov ◽  
Anke K. Wienecke-Baldacchino ◽  
Andrei Zinovyev ◽  
Urszula Czerwińska ◽  
Arnaud Muller ◽  
...  

Abstract Background The amount of publicly available cancer-related “omics” data is constantly growing and can potentially be used to gain insights into the tumour biology of new cancer patients, their diagnosis and suitable treatment options. However, the integration of different datasets is not straightforward and requires specialized approaches to deal with heterogeneity at technical and biological levels. Methods Here we present a method that can overcome technical biases, predict clinically relevant outcomes and identify tumour-related biological processes in patients using previously collected large discovery datasets. The approach is based on independent component analysis (ICA) – an unsupervised method of signal deconvolution. We developed parallel consensus ICA that robustly decomposes transcriptomics datasets into expression profiles with minimal mutual dependency. Results By applying the method to a small cohort of primary melanoma and control samples combined with a large discovery melanoma dataset, we demonstrate that our method distinguishes cell-type specific signals from technical biases and allows to predict clinically relevant patient characteristics. We showed the potential of the method to predict cancer subtypes and estimate the activity of key tumour-related processes such as immune response, angiogenesis and cell proliferation. ICA-based risk score was proposed and its connection to patient survival was validated with an independent cohort of patients. Additionally, through integration of components identified for mRNA and miRNA data, the proposed method helped deducing biological functions of miRNAs, which would otherwise not be possible. Conclusions We present a method that can be used to map new transcriptomic data from cancer patient samples onto large discovery datasets. The method corrects technical biases, helps characterizing activity of biological processes or cell types in the new samples and provides the prognosis of patient survival.


Sign in / Sign up

Export Citation Format

Share Document