First step toward gene expression data integration: transcriptomic data acquisition with COMMAND>_

AbstractBackgroundAccording to major public repositories statistics an overwhelming majority of the existing and newly uploaded data originates from microarray experiments. Unfortunately, the potential of this data to bring new insights is limited by the effects of individual study-specific biases due to small number of biological samples. Increasing sample size by direct microarray data integration increases the statistical power to obtain a more precise estimate of gene expression in a population of individuals resulting in lower false discovery rates. However, despite numerous recommendations for gene expression data integration, there is a lack of a systematic comparison of different processing approaches aimed to asses microarray platforms diversity and ambiguous probesets to genes correspondence, leading to low number of studies applying integration.ResultsHere, we investigated five different approaches of the microarrays data processing in comparison with RNA-seq data on breast cancer samples. We aimed to evaluate different probesets annotations as well as different procedures of choosing between probesets mapped to the same gene. We show that pipelines rankings are mostly preserved across Affymetrix and Illumina platforms. BrainArray approach based on updated annotation and redesigned probesets definition and choosing probeset with the maximum average signal across the samples have best correlation with RNA-seq, while averaging probesets signals as well as scoring the quality of probes sequences mapping to the transcripts of the targeted gene have worse correlation. Finally, randomly selecting probeset among probesets mapped to the same gene significantly decreases the correlation with RNA-seq.ConclusionWe show that methods, which rely on actual probesets signal intensities, are advantageous to methods considering biological characteristics of the probes sequences only and that cross-platform integration of datasets improves correlation with the RNA-seq data. We consider the results obtained in this paper contributive to the integrative analysis as a worthwhile alternative to the classical meta-analysis of the multiple gene expression datasets.

Download Full-text

Batch effect removal methods for microarray gene expression data integration: a survey

Briefings in Bioinformatics ◽

10.1093/bib/bbs037 ◽

2012 ◽

Vol 14 (4) ◽

pp. 469-490 ◽

Cited By ~ 153

Author(s):

C. Lazar ◽

S. Meganck ◽

J. Taminau ◽

D. Steenhoff ◽

A. Coletta ◽

...

Keyword(s):

Gene Expression ◽

Data Integration ◽

Gene Expression Data ◽

Microarray Gene Expression Data ◽

Batch Effect ◽

Expression Data ◽

Microarray Gene Expression ◽

Microarray Gene

Download Full-text

Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data

10.1101/642926 ◽

2019 ◽

Cited By ~ 5

Author(s):

Aditya Pratapa ◽

Amogh P. Jalihal ◽

Jeffrey N. Law ◽

Aditya Bharadwaj ◽

T. M. Murali

Keyword(s):

Gene Expression ◽

Single Cell ◽

Gene Expression Data ◽

Expression Data ◽

Boolean Models ◽

Transcriptomic Data ◽

Inference Algorithms ◽

Cell Gene Expression ◽

Gene Regulatory ◽

Cell Gene

AbstractWe present a comprehensive evaluation of state-of-the-art algorithms for inferring gene regulatory networks (GRNs) from single-cell gene expression data. We develop a systematic framework called BEELINE for this purpose. We use synthetic networks with predictable cellular trajectories as well as curated Boolean models to serve as the ground truth for evaluating the accuracy of GRN inference algorithms. We develop a strategy to simulate single-cell gene expression data from these two types of networks that avoids the pitfalls of previously-used methods. We selected 12 representative GRN inference algorithms. We found that the accuracy of these methods (measured in terms of AUROC and AUPRC) was moderate, by and large, although the methods were better in recovering interactions in the synthetic networks than the Boolean models. Techniques that did not require pseudotime-ordered cells were more accurate, in general. The observation that the endpoints of many false positive edges were connected by paths of length two in the Boolean models suggested that indirect effects may be predominant in the outputs of the algorithms we tested. The predicted networks were considerably inconsistent with each other, indicating that combining GRN inference algorithms using ensembles is likely to be challenging. Based on the results, we present some recommendations to users of GRN inference algorithms, including suggestions on how to create simulated gene expression datasets for testing them. BEELINE, which is available at http://github.com/murali-group/BEELINE under an open-source license, will aid in the future development of GRN inference algorithms for single-cell transcriptomic data.

Download Full-text

BioGPS and GXD: mouse gene expression data—the benefits and challenges of data integration

Mammalian Genome ◽

10.1007/s00335-012-9408-0 ◽

2012 ◽

Vol 23 (9-10) ◽

pp. 550-558 ◽

Cited By ~ 5

Author(s):

Martin Ringwald ◽

Chunlei Wu ◽

Andrew I. Su

Keyword(s):

Gene Expression ◽

Data Integration ◽

Gene Expression Data ◽

Mouse Gene ◽

Expression Data ◽

Mouse Gene Expression

Download Full-text

Microarray Gene Expression Data Integration: An Application to Brain Tumor Grade Determination

Advances in Intelligent Systems and Computing - 9th International Conference on Practical Applications of Computational Biology and Bioinformatics ◽

10.1007/978-3-319-19776-0_14 ◽

2015 ◽

pp. 127-135 ◽

Cited By ~ 1

Author(s):

Eduardo Valente ◽

Miguel Rocha

Keyword(s):

Gene Expression ◽

Brain Tumor ◽

Data Integration ◽

Gene Expression Data ◽

Tumor Grade ◽

Microarray Gene Expression Data ◽

Expression Data ◽

Microarray Gene Expression ◽

Microarray Gene

Download Full-text

sciCAN: Single-cell chromatin accessibility and gene expression data integration via Cycle-consistent Adversarial Network

10.1101/2021.11.30.470677 ◽

2021 ◽

Author(s):

Yang Xu ◽

Edmon Begoli ◽

Rachel Patton McCord

Keyword(s):

Gene Expression ◽

Data Integration ◽

Single Cell ◽

Gene Expression Data ◽

Chromatin Accessibility ◽

Cellular Systems ◽

Expression Data ◽

Adversarial Network ◽

Cell Technologies ◽

Cell Data

The booming single-cell technologies bring a surge of high dimensional data that come from different sources and represent cellular systems from different views. With advances in single-cell technologies, integrating single-cell data across modalities arises as a new computational challenge and gains more and more attention within the community. Here, we present a novel adversarial approach, sciCAN, to integrate single-cell chromatin accessibility and gene expression data in an unsupervised manner. We benchmarked sciCAN with 3 state-of-the-art (SOTA) methods in 5 scATAC-seq/scRNA-seq datasets, and we demonstrated that our method dealt with data integration with better balance of mutual transferring between modalities than the other 3 SOTA methods. We further applied sciCAN to 10X Multiome data and confirmed the integrated representation preserves information of the hematopoietic hierarchy. Finally, we investigated CRSIPR-perturbed single-cell K562 ATAC-seq and RNA-seq data to identify cells with related responses to different perturbations in these different modalities.

Download Full-text