Comparison of RNA-seq and microarray platforms for splice event detection using a cross-platform algorithm

Large, publicly available gene expression datasets are often analyzed with the aid of machine learning algorithms. Although RNA-seq is increasingly the technology of choice, a wealth of expression data already exist in the form of microarray data. If machine learning models built from legacy data can be applied to RNA-seq data, larger, more diverse training datasets can be created and validation can be performed on newly generated data. We developed Training Distribution Matching (TDM), which transforms RNA-seq data for use with models constructed from legacy platforms. We evaluated TDM, as well as quantile normalization and a simple log2 transformation, on both simulated and biological datasets of gene expression. Our evaluation included both supervised and unsupervised machine learning approaches. We found that TDM exhibited consistently strong performance across settings and that quantile normalization also performed well in many circumstances. We also provide a TDM package for the R programming language.

Download Full-text

Cross-platform normalization of microarray and RNA-seq data for machine learning applications

PeerJ ◽

10.7717/peerj.1621 ◽

2016 ◽

Vol 4 ◽

pp. e1621 ◽

Cited By ~ 42

Author(s):

Jeffrey A. Thompson ◽

Jie Tan ◽

Casey S. Greene

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Machine Learning Algorithms ◽

Quantile Normalization ◽

Learning Approaches ◽

Rna Seq ◽

Distribution Matching ◽

Machine Learning Applications ◽

Cross Platform ◽

R Programming

Large, publicly available gene expression datasets are often analyzed with the aid of machine learning algorithms. Although RNA-seq is increasingly the technology of choice, a wealth of expression data already exist in the form of microarray data. If machine learning models built from legacy data can be applied to RNA-seq data, larger, more diverse training datasets can be created and validation can be performed on newly generated data. We developed Training Distribution Matching (TDM), which transforms RNA-seq data for use with models constructed from legacy platforms. We evaluated TDM, as well as quantile normalization, nonparanormal transformation, and a simplelog2transformation, on both simulated and biological datasets of gene expression. Our evaluation included both supervised and unsupervised machine learning approaches. We found that TDM exhibited consistently strong performance across settings and that quantile normalization also performed well in many circumstances. We also provide a TDM package for the R programming language.

Download Full-text

Cross-platform ultradeep transcriptomic profiling of human reference RNA samples by RNA-Seq

Scientific Data ◽

10.1038/sdata.2014.20 ◽

2014 ◽

Vol 1 (1) ◽

Cited By ~ 11

Author(s):

Joshua Xu ◽

Zhenqiang Su ◽

Huixiao Hong ◽

Jean Thierry-Mieg ◽

Danielle Thierry-Mieg ◽

...

Keyword(s):

Rna Seq ◽

Transcriptomic Profiling ◽

Cross Platform

Download Full-text

Reproducible Bioconductor workflows using browser-based interactive notebooks and containers

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocx120 ◽

2017 ◽

Vol 25 (1) ◽

pp. 4-12 ◽

Cited By ~ 15

Author(s):

Reem Almugbel ◽

Ling-Hong Hung ◽

Jiaming Hu ◽

Abeer Almutairy ◽

Nicole Ortogero ◽

...

Keyword(s):

Interactive Software ◽

Rna Seq ◽

Software Environment ◽

Bioinformatics Analyses ◽

New Methods ◽

Wide Range ◽

Cross Platform ◽

Differential Gene ◽

User Friendly ◽

Bioinformatics Workflows

Abstract Objective Bioinformatics publications typically include complex software workflows that are difficult to describe in a manuscript. We describe and demonstrate the use of interactive software notebooks to document and distribute bioinformatics research. We provide a user-friendly tool, BiocImageBuilder, that allows users to easily distribute their bioinformatics protocols through interactive notebooks uploaded to either a GitHub repository or a private server. Materials and methods We present four different interactive Jupyter notebooks using R and Bioconductor workflows to infer differential gene expression, analyze cross-platform datasets, process RNA-seq data and KinomeScan data. These interactive notebooks are available on GitHub. The analytical results can be viewed in a browser. Most importantly, the software contents can be executed and modified. This is accomplished using Binder, which runs the notebook inside software containers, thus avoiding the need to install any software and ensuring reproducibility. All the notebooks were produced using custom files generated by BiocImageBuilder. Results BiocImageBuilder facilitates the publication of workflows with a point-and-click user interface. We demonstrate that interactive notebooks can be used to disseminate a wide range of bioinformatics analyses. The use of software containers to mirror the original software environment ensures reproducibility of results. Parameters and code can be dynamically modified, allowing for robust verification of published results and encouraging rapid adoption of new methods. Conclusion Given the increasing complexity of bioinformatics workflows, we anticipate that these interactive software notebooks will become as necessary for documenting software methods as traditional laboratory notebooks have been for documenting bench protocols, and as ubiquitous.

Download Full-text

RNAdetector: a free user-friendly stand-alone and cloud-based system for RNA-Seq data analysis

BMC Bioinformatics ◽

10.1186/s12859-021-04211-7 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Alessandro La Ferlita ◽

Salvatore Alaimo ◽

Sebastiano Di Bella ◽

Emanuele Martorana ◽

Georgios I. Laliotis ◽

...

Keyword(s):

Data Analysis ◽

Transcriptome Profiling ◽

Biological Species ◽

Third Party ◽

Rna Seq ◽

Rna Molecules ◽

Non Coding Rna ◽

Cross Platform ◽

Cloud Environments ◽

User Friendly

Abstract Background RNA-Seq is a well-established technology extensively used for transcriptome profiling, allowing the analysis of coding and non-coding RNA molecules. However, this technology produces a vast amount of data requiring sophisticated computational approaches for their analysis than other traditional technologies such as Real-Time PCR or microarrays, strongly discouraging non-expert users. For this reason, dozens of pipelines have been deployed for the analysis of RNA-Seq data. Although interesting, these present several limitations and their usage require a technical background, which may be uncommon in small research laboratories. Therefore, the application of these technologies in such contexts is still limited and causes a clear bottleneck in knowledge advancement. Results Motivated by these considerations, we have developed RNAdetector, a new free cross-platform and user-friendly RNA-Seq data analysis software that can be used locally or in cloud environments through an easy-to-use Graphical User Interface allowing the analysis of coding and non-coding RNAs from RNA-Seq datasets of any sequenced biological species. Conclusions RNAdetector is a new software that fills an essential gap between the needs of biomedical and research labs to process RNA-Seq data and their common lack of technical background in performing such analysis, which usually relies on outsourcing such steps to third party bioinformatics facilities or using expensive commercial software.

Download Full-text

Cross-platform normalization of microarray and RNA-seq data for machine learning applications

10.7287/peerj.preprints.1460v1 ◽

2015 ◽

Author(s):

Jeffrey A Thompson ◽

Jie Tan ◽

Casey S Greene

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Machine Learning Algorithms ◽

Quantile Normalization ◽

Learning Approaches ◽

Rna Seq ◽

Distribution Matching ◽

Machine Learning Applications ◽

Cross Platform ◽

R Programming

Large, publicly available gene expression datasets are often analyzed with the aid of machine learning algorithms. Although RNA-seq is increasingly the technology of choice, a wealth of expression data already exist in the form of microarray data. If machine learning models built from legacy data can be applied to RNA-seq data, larger, more diverse training datasets can be created and validation can be performed on newly generated data. We developed Training Distribution Matching (TDM), which transforms RNA-seq data for use with models constructed from legacy platforms. We evaluated TDM, as well as quantile normalization and a simple log2 transformation, on both simulated and biological datasets of gene expression. Our evaluation included both supervised and unsupervised machine learning approaches. We found that TDM exhibited consistently strong performance across settings and that quantile normalization also performed well in many circumstances. We also provide a TDM package for the R programming language.

Download Full-text

Cross-platform transcriptomic profiling of the response to recombinant human erythropoietin

10.21203/rs.3.rs-510750/v1 ◽

2021 ◽

Author(s):

Guan Wang ◽

Traci Kitaoka ◽

Ali Crawford ◽

Qian Mao ◽

Andrew Hesketh ◽

...

Keyword(s):

Gene Expression ◽

Recombinant Human Erythropoietin ◽

Rna Seq ◽

Healthy Individuals ◽

Human Erythropoietin ◽

Rna Biology ◽

Platform Comparison ◽

Cross Platform ◽

Differential Gene ◽

Sequencing By Synthesis

Abstract RNA-seq has matured and become an important tool for studying RNA biology. Here we compared two RNA-seq (Illumina sequencing by synthesis and MGI DNBSEQ™) and two microarray platforms (Illumina Expression BeadChip and GeneChip™ Human Transcriptome Array 2.0) in healthy individuals administered recombinant human erythropoietin for transcriptome-wide quantification of differential gene expression. The results show that total RNA sequencing combined with DNB-seq produced a multitude of genes of biological relevance and significance in response to recombinant human erythropoietin, in contrast to other platforms. Through data triangulation linking genes to functions, genes representing the processes of erythropoiesis as well as non-erythropoietic functions of erythropoietin were unveiled. This study provides a knowledge base of genes characterising the responses to recombinant human erythropoietin through cross-platform comparison and validation.

Download Full-text

Comparing alternative pipelines for cross-platform microarray gene expression data integration with RNA-seq data in breast cancer

10.1101/059600 ◽

2016 ◽

Cited By ~ 2

Author(s):

Alina Frolova ◽

Vladyslav Bondarenko ◽

Maria Obolenska

Keyword(s):

Breast Cancer ◽

Gene Expression ◽

Data Integration ◽

Gene Expression Data ◽

Statistical Power ◽

Meta Analysis ◽

Expression Data ◽

Rna Seq ◽

Microarray Gene Expression ◽

Cross Platform

AbstractBackgroundAccording to major public repositories statistics an overwhelming majority of the existing and newly uploaded data originates from microarray experiments. Unfortunately, the potential of this data to bring new insights is limited by the effects of individual study-specific biases due to small number of biological samples. Increasing sample size by direct microarray data integration increases the statistical power to obtain a more precise estimate of gene expression in a population of individuals resulting in lower false discovery rates. However, despite numerous recommendations for gene expression data integration, there is a lack of a systematic comparison of different processing approaches aimed to asses microarray platforms diversity and ambiguous probesets to genes correspondence, leading to low number of studies applying integration.ResultsHere, we investigated five different approaches of the microarrays data processing in comparison with RNA-seq data on breast cancer samples. We aimed to evaluate different probesets annotations as well as different procedures of choosing between probesets mapped to the same gene. We show that pipelines rankings are mostly preserved across Affymetrix and Illumina platforms. BrainArray approach based on updated annotation and redesigned probesets definition and choosing probeset with the maximum average signal across the samples have best correlation with RNA-seq, while averaging probesets signals as well as scoring the quality of probes sequences mapping to the transcripts of the targeted gene have worse correlation. Finally, randomly selecting probeset among probesets mapped to the same gene significantly decreases the correlation with RNA-seq.ConclusionWe show that methods, which rely on actual probesets signal intensities, are advantageous to methods considering biological characteristics of the probes sequences only and that cross-platform integration of datasets improves correlation with the RNA-seq data. We consider the results obtained in this paper contributive to the integrative analysis as a worthwhile alternative to the classical meta-analysis of the multiple gene expression datasets.

Download Full-text

Alternative splicing analysis benchmark with DICAST

10.1101/2022.01.05.475067 ◽

2022 ◽

Author(s):

Amit M Fenn ◽

Olga Tsoy ◽

Tim Faro ◽

Fanny Roessler ◽

Alexander Dietrich ◽

...

Keyword(s):

Alternative Splicing ◽

Event Detection ◽

Whole Blood ◽

Tool Development ◽

Major Contributor ◽

Rna Seq ◽

Consensus Approach ◽

Reporting Standard ◽

Health And Disease ◽

Isoform Quantification

Alternative splicing is a major contributor to transcriptome and proteome diversity in health and disease. A plethora of tools have been developed for studying alternative splicing in RNA-seq data. Previous benchmarks focused on isoform quantification and mapping. They neglected event detection tools, which arguably provide the most detailed insights into the alternative splicing process. DICAST offers a modular and extensible framework for the analysis of alternative splicing integrating 11 splice-aware mapping and eight event detection tools. We benchmark all tools extensively on simulated as well as whole blood RNA-seq data. STAR and HISAT2 demonstrated the best balance between performance and run time. The performance of event detection tools varies widely with no tool outperforming all others. DICAST allows researchers to employ a consensus approach to consider the most successful tools jointly for robust event detection. Furthermore, we propose the first reporting standard to unify existing formats and to guide future tool development.

Download Full-text

Peer Review #2 of "Cross-platform normalization of microarray and RNA-seq data for machine learning applications (v0.1)"

10.7287/peerj.1621v0.1/reviews/2 ◽

2016 ◽

Author(s):

CT Brown

Keyword(s):

Machine Learning ◽

Peer Review ◽

Rna Seq ◽

Machine Learning Applications ◽

Cross Platform

Download Full-text