Making Sense of Human Lung Carcinomas Gene Expression Data: Integration and Analysis of Two Affymetrix Platform Experiments

2006 ◽  
pp. 81-94
Author(s):  
Xiwu Lin ◽  
Daniel Park ◽  
Sergio Eslava ◽  
Kwan R. Lee ◽  
Raymond L.H. Lam ◽  
...  
2009 ◽  
Vol 10 (1) ◽  
pp. 158 ◽  
Author(s):  
Jaume Mercade ◽  
Antonio Espinosa ◽  
Jose Enrique Adsuara ◽  
Rosa Adrados ◽  
Jordi Segura ◽  
...  

2012 ◽  
Vol 28 (17) ◽  
pp. 2283-2284
Author(s):  
Peter M. Krempl ◽  
Juergen Mairhofer ◽  
Gerald Striedner ◽  
Gerhard G. Thallinger

2016 ◽  
Author(s):  
Alina Frolova ◽  
Vladyslav Bondarenko ◽  
Maria Obolenska

AbstractBackgroundAccording to major public repositories statistics an overwhelming majority of the existing and newly uploaded data originates from microarray experiments. Unfortunately, the potential of this data to bring new insights is limited by the effects of individual study-specific biases due to small number of biological samples. Increasing sample size by direct microarray data integration increases the statistical power to obtain a more precise estimate of gene expression in a population of individuals resulting in lower false discovery rates. However, despite numerous recommendations for gene expression data integration, there is a lack of a systematic comparison of different processing approaches aimed to asses microarray platforms diversity and ambiguous probesets to genes correspondence, leading to low number of studies applying integration.ResultsHere, we investigated five different approaches of the microarrays data processing in comparison with RNA-seq data on breast cancer samples. We aimed to evaluate different probesets annotations as well as different procedures of choosing between probesets mapped to the same gene. We show that pipelines rankings are mostly preserved across Affymetrix and Illumina platforms. BrainArray approach based on updated annotation and redesigned probesets definition and choosing probeset with the maximum average signal across the samples have best correlation with RNA-seq, while averaging probesets signals as well as scoring the quality of probes sequences mapping to the transcripts of the targeted gene have worse correlation. Finally, randomly selecting probeset among probesets mapped to the same gene significantly decreases the correlation with RNA-seq.ConclusionWe show that methods, which rely on actual probesets signal intensities, are advantageous to methods considering biological characteristics of the probes sequences only and that cross-platform integration of datasets improves correlation with the RNA-seq data. We consider the results obtained in this paper contributive to the integrative analysis as a worthwhile alternative to the classical meta-analysis of the multiple gene expression datasets.


2012 ◽  
Vol 14 (4) ◽  
pp. 469-490 ◽  
Author(s):  
C. Lazar ◽  
S. Meganck ◽  
J. Taminau ◽  
D. Steenhoff ◽  
A. Coletta ◽  
...  

2021 ◽  
Author(s):  
Yang Xu ◽  
Edmon Begoli ◽  
Rachel Patton McCord

The booming single-cell technologies bring a surge of high dimensional data that come from different sources and represent cellular systems from different views. With advances in single-cell technologies, integrating single-cell data across modalities arises as a new computational challenge and gains more and more attention within the community. Here, we present a novel adversarial approach, sciCAN, to integrate single-cell chromatin accessibility and gene expression data in an unsupervised manner. We benchmarked sciCAN with 3 state-of-the-art (SOTA) methods in 5 scATAC-seq/scRNA-seq datasets, and we demonstrated that our method dealt with data integration with better balance of mutual transferring between modalities than the other 3 SOTA methods. We further applied sciCAN to 10X Multiome data and confirmed the integrated representation preserves information of the hematopoietic hierarchy. Finally, we investigated CRSIPR-perturbed single-cell K562 ATAC-seq and RNA-seq data to identify cells with related responses to different perturbations in these different modalities.


Author(s):  
Mengyun Wu ◽  
Huangdi Yi ◽  
Shuangge Ma

Abstract Gene expression data have played an essential role in many biomedical studies. When the number of genes is large and sample size is limited, there is a ‘lack of information’ problem, leading to low-quality findings. To tackle this problem, both horizontal and vertical data integrations have been developed, where vertical integration methods collectively analyze data on gene expressions as well as their regulators (such as mutations, DNA methylation and miRNAs). In this article, we conduct a selective review of vertical data integration methods for gene expression data. The reviewed methods cover both marginal and joint analysis and supervised and unsupervised analysis. The main goal is to provide a sketch of the vertical data integration paradigm without digging into too many technical details. We also briefly discuss potential pitfalls, directions for future developments and application notes.


Sign in / Sign up

Export Citation Format

Share Document