scholarly journals Arkas: Rapid reproducible RNAseq analysis

F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 586 ◽  
Author(s):  
Anthony R. Colombo ◽  
Timothy J. Triche Jr ◽  
Giridharan Ramsingh

The recently introduced Kallisto pseudoaligner has radically simplified the quantification of transcripts in RNA-sequencing experiments.  We offer cloud-scale RNAseq pipelines Arkas-Quantification, and Arkas-Analysis available within Illumina’s BaseSpace cloud application platform which expedites Kallisto preparatory routines, reliably calculates differential expression, and performs gene-set enrichment of REACTOME pathways.  Due to inherit inefficiencies of scale, Illumina's BaseSpace computing platform offers a massively parallel distributive environment improving data management services and data importing.  Arkas-Quantification deploys Kallisto for parallel cloud computations and is conveniently integrated downstream from the BaseSpace Sequence Read Archive (SRA) import/conversion application titled SRA Import.  Arkas-Analysis annotates the Kallisto results by extracting structured information directly from source FASTA files with per-contig metadata, calculates the differential expression and gene-set enrichment analysis on both coding genes and transcripts. The Arkas cloud pipeline supports ENSEMBL transcriptomes and can be used downstream from the SRA Import facilitating raw sequencing importing, SRA FASTQ conversion, RNA quantification and analysis steps.

F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 586 ◽  
Author(s):  
Anthony R. Colombo ◽  
Timothy J. Triche Jr ◽  
Giridharan Ramsingh

The recently introduced Kallisto pseudoaligner has radically simplified the quantification of transcripts in RNA-sequencing experiments.  We offer cloud-scale RNAseq pipelines Arkas-Quantification, which deploys Kallisto for parallel cloud computations, and Arkas-Analysis, which annotates the Kallisto results by extracting structured information directly from source FASTA files with per-contig metadata and calculates the differential expression and gene-set enrichment analysis on both coding genes and transcripts. The biologically informative downstream gene-set analysis maintains special focus on Reactome annotations while supporting ENSEMBL transcriptomes. The Arkas cloud quantification pipeline includes support for custom user-uploaded FASTA files, selection for bias correction and pseudoBAM output. The option to retain pseudoBAM output for structural variant detection and annotation provides a middle ground between de novo transcriptome assembly and routine quantification, while consuming a fraction of the resources used by popular fusion detection pipelines.  Illumina's BaseSpace cloud computing environment, where these two applications are hosted, offers a massively parallel distributive quantification step for users where investigators are better served by cloud-based computing platforms due to inherent efficiencies of scale.


2019 ◽  
Author(s):  
Rani K. Powers ◽  
Anthony Sun ◽  
James C. Costello

AbstractSummaryGSEA-InContext Explorer is a Shiny app that allows users to perform two methods of gene set enrichment analysis (GSEA). The first, GSEAPreranked, applies the GSEA algorithm in which statistical significance is estimated from a null distribution of enrichment scores generated for randomly permuted gene sets. The second, GSEA-InContext, incorporates a user-defined set of background experiments to define the null distribution and calculate statistical significance. GSEA-InContext Explorer allows the user to build custom background sets from a compendium of over 5,700 curated experiments, run both GSEAPreranked and GSEA-InContext on their own uploaded experiment, and explore the results using an interactive interface. This tool will allow researchers to visualize gene sets that are commonly enriched across experiments and identify gene sets that are uniquely significant in their experiment, thus complementing current methods for interpreting gene set enrichment results.Availability and implementationThe code for GSEA-InContext Explorer is available at: https://github.com/CostelloLab/GSEA-InContext_Explorer and the interactive tool is at: http://gsea-incontext_explorer.ngrok.io


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Sebastian Canzler ◽  
Jörg Hackermüller

Abstract Background Gaining biological insights into molecular responses to treatments or diseases from omics data can be accomplished by gene set or pathway enrichment methods. A plethora of different tools and algorithms have been developed so far. Among those, the gene set enrichment analysis (GSEA) proved to control both type I and II errors well. In recent years the call for a combined analysis of multiple omics layers became prominent, giving rise to a few multi-omics enrichment tools. Each of these has its own drawbacks and restrictions regarding its universal application. Results Here, we present the package aiding to calculate a combined GSEA-based pathway enrichment on multiple omics layers. The package queries 8 different pathway databases and relies on the robust GSEA algorithm for a single-omics enrichment analysis. In a final step, those scores will be combined to create a robust composite multi-omics pathway enrichment measure. supports 11 different organisms and includes a comprehensive mapping of transcripts, proteins, and metabolite IDs. Conclusions With we introduce a highly versatile tool for multi-omics pathway integration that minimizes previous restrictions in terms of omics layer selection, pathway database availability, organism selection and the mapping of omics feature identifiers. is publicly available under the GPL-3 license at https://github.com/yigbt/multiGSEA and at bioconductor: https://bioconductor.org/packages/multiGSEA.


2018 ◽  
Author(s):  
Danyue Dong ◽  
Tian Yuan ◽  
Shijie C. Zheng ◽  
Andrew E. Teschendorff

AbstractMotivationThe biological interpretation of differentially methylated sites derived from Epigenome-Wide-Association Studies remains a significant challenge. Gene Set Enrichment Analysis (GSEA) is a general tool to help aid biological interpretation, yet its correct and unbiased implementation in the EWAS context is difficult due to the differential probe representation of Illumina Infinium DNA methylation beadchips.ResultsWe present a novel GSEA method, called ebayGSEA, which ranks genes, not CpGs, according to the overall level of differential methylation, as assessed using all the probes mapping to the given gene. Applied on simulated and real EWAS data, we show how ebayGSEA may exhibit higher sensitivity and specificity than the current state-of-the-art, whilst also avoiding differential probe representation bias. Thus, ebayGSEA will be a useful additional tool to aid the interpretation of EWAS data.Availability and implementationebayGSEA is available from https://github.com/aet21/ebayGSEA, and has been incorporated into the ChAMP Bioconductor package (https://www.bioconductor.org).


Author(s):  
Sebastian Canzler ◽  
Jörg Hackermüller

AbstractGaining biological insights into molecular responses to treatments or diseases from omics data can be accomplished by gene set or pathway enrichment methods. A plethora of different tools and algorithms have been developed so far. Among those, the gene set enrichment analysis (GSEA) proved to control both type I and II errors well.In recent years the call for a combined analysis of multiple omics layer became prominent, giving rise to a few multi-omics enrichment tools. Each of which has its own drawbacks and restrictions regarding its universal application.Here, we present the multiGSEA package aiding to calculate a combined GSEA-based pathway enrichment on multiple omics layer. The package queries 8 different pathway databases and relies on the robust GSEA algorithm for a single-omics enrichment analysis. In a final step, those scores will be combined to create a robust composite multi-omics pathway enrichment measure. multiGSEA supports 11 different organisms and includes a comprehensive mapping of transcripts, proteins, and metabolite IDs. It is publicly available under the GPL-3 license at https://github.com/yigbt/multiGSEA and at Bioconductor: https://bioconductor.org/packages/multiGSEA.


2019 ◽  
Author(s):  
Tao Fang ◽  
Iakov Davydov ◽  
Daniel Marbach ◽  
Jitao David Zhang

AbstractMotivationCanonical methods for gene-set enrichment analysis assume independence between gene-sets. In practice, heterogeneous gene-sets from diverse sources are frequently combined and used, resulting in gene-sets with overlapping genes. They compromise statistical modelling and complicate interpretation of results.ResultsWe rephrase gene-set enrichment as a regression problem. Given some genes of interest (e.g.a list of hits from an experiment) and gene-sets (e.g.functional annotations or pathways), we aim to identify a sparse list of gene-sets for the genes of interest. In a regression framework, this amounts to identifying a minimum set of gene-sets that optimally predicts whether any gene belongs to the given genes of interest. To accommodate redundancy between gene-sets, we propose regularized regression techniques such as theelastic net.We report that regression-based results are consistent with established gene-set enrichment methods but more parsimonious and interpretable.AvailabilityWe implement the model ingerr(gene-set enrichment with regularized regression), an R package freely available athttps://github.com/TaoDFang/gerrand submitted toBioconductor.Code and data required to reproduce the results of this study are available athttps://github.com/TaoDFang/GeneModuleAnnotationPaper.ContactJitao David Zhang ([email protected]), Roche Pharma Research and Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd. Grenzacherstrasse 124, 4070 Basel, Switzerland.


F1000Research ◽  
2015 ◽  
Vol 4 ◽  
pp. 167 ◽  
Author(s):  
Yan Tan ◽  
Felix Wu ◽  
Pablo Tamayo ◽  
W. Nicholas Haining ◽  
Jill P. Mesirov

Summary: Gene set enrichment analysis (GSEA) approaches are widely used to identify coordinately regulated genes associated with phenotypes of interest. Here, we present Constellation Map, a tool to visualize and interpret the results when enrichment analyses yield a long list of significantly enriched gene sets. Constellation Map identifies commonalities that explain the enrichment of multiple top-scoring gene sets and maps the relationships between them. Constellation Map can help investigators take full advantage of GSEA and facilitates the biological interpretation of enrichment results. Availability: Constellation Map is freely available as a GenePattern module at http://www.genepattern.org.


Sign in / Sign up

Export Citation Format

Share Document