scholarly journals GXD’s RNA-Seq and Microarray Experiment Search: using curated metadata to reliably find mouse expression studies of interest

Database ◽  
2020 ◽  
Vol 2020 ◽  
Author(s):  
Constance M Smith ◽  
James A Kadin ◽  
Richard M Baldarelli ◽  
Jonathan S Beal ◽  
Olin Blodgett ◽  
...  

Abstract The Gene Expression Database (GXD), an extensive community resource of curated expression information for the mouse, has developed an RNA-Seq and Microarray Experiment Search (http://www.informatics.jax.org/gxd/htexp_index). This tool allows users to quickly and reliably find specific experiments in ArrayExpress and the Gene Expression Omnibus (GEO) that study endogenous gene expression in wild-type and mutant mice. Standardized metadata annotations, curated by GXD, allow users to specify the anatomical structure, developmental stage, mutated gene, strain and sex of samples of interest, as well as the study type and key parameters of the experiment. These searches, powered by controlled vocabularies and ontologies, can be combined with free text searching of experiment titles and descriptions. Search result summaries include link-outs to ArrayExpress and GEO, providing easy access to the expression data itself. Links to the PubMed entries for accompanying publications are also included. More information about this tool and GXD can be found at the GXD home page (http://www.informatics.jax.org/expression.shtml). Database URL: http://www.informatics.jax.org/expression.shtml


BMC Genomics ◽  
2013 ◽  
Vol 14 (1) ◽  
pp. 778 ◽  
Author(s):  
Traver Hart ◽  
H Komori ◽  
Sarah LaMere ◽  
Katie Podshivalova ◽  
Daniel R Salomon


2019 ◽  
Author(s):  
Bastian Seelbinder ◽  
Thomas Wolf ◽  
Steffen Priebe ◽  
Sylvie McNamara ◽  
Silvia Gerber ◽  
...  

ABSTRACTIn transcriptomics, the study of the total set of RNAs transcribed by the cell, RNA sequencing (RNA-seq) has become the standard tool for analysing gene expression. The primary goal is the detection of genes whose expression changes significantly between two or more conditions, either for a single species or for two or more interacting species at the same time (dual RNA-seq, triple RNA-seq and so forth). The analysis of RNA-seq can be simplified as many steps of the data pre-processing can be standardised in a pipeline.In this publication we present the “GEO2RNAseq” pipeline for complete, quick and concurrent pre-processing of single, dual, and triple RNA-seq data. It covers all pre-processing steps starting from raw sequencing data to the analysis of differentially expressed genes, including various tables and figures to report intermediate and final results. Raw data may be provided in FASTQ format or can be downloaded automatically from the Gene Expression Omnibus repository. GEO2RNAseq strongly incorporates experimental as well as computational metadata. GEO2RNAseq is implemented in R, lightweight, easy to install via Conda and easy to use, but still very flexible through using modular programming and offering many extensions and alternative workflows.GEO2RNAseq is publicly available at https://anaconda.org/xentrics/r-geo2rnaseq and https://bitbucket.org/thomas_wolf/geo2rnaseq/overview, including source code, installation instruction, and comprehensive package documentation.



2019 ◽  
Author(s):  
Zou Yutong ◽  
Bui Thuy Tien ◽  
Kumar Selvarajoo

AbstractHere we report a bio-statistical/informatics tool, ABioTrans, developed in R for gene expression analysis. The tool allows the user to directly read RNA-Seq data files deposited in the Gene Expression Omnibus or GEO database. Operated using any web browser application, ABioTrans provides easy options for multiple statistical distribution fitting, Pearson and Spearman rank correlations, PCA, k-means and hierarchical clustering, differential expression analysis, Shannon entropy and noise (square of coefficient of variation) analyses, as well as Gene ontology classifications.Availability and implementationABioTrans is available at https://github.com/buithuytien/ABioTransOperating system(s): Platform independent (web browser)Programming language: R (R studio)Other requirements: Bioconductor genome wide annotation databases, R-packages (shiny, LSD, fitdistrplus, actuar, entropy, moments, RUVSeq, edgeR, DESeq2, NOISeq, AnnotationDbi, ComplexHeatmap, circlize, clusterProfiler, reshape2, DT, plotly, shinycssloaders, dplyr, ggplot2). These packages will automatically be installed when the ABioTrans.R is executed in R studio.No restriction of usage for non-academic.



2012 ◽  
Vol 2012 ◽  
pp. 1-8 ◽  
Author(s):  
Robert Ekblom ◽  
Jon Slate ◽  
Gavin J. Horsburgh ◽  
Tim Birkhead ◽  
Terry Burke

Next-generation sequencing of transcriptomes (RNA-Seq) is being used increasingly in studies of nonmodel organisms. Here, we evaluate the effectiveness of normalising cDNA libraries prior to sequencing in a small-scale study of the zebra finch. We find that assemblies produced from normalised libraries had a larger number of contigs but used fewer reads compared to unnormalised libraries. Considerably more genes were also detected using the contigs produced from normalised cDNA, and microsatellite discovery was up to 73% more efficient in these. There was a positive correlation between the detected expression level of genes in normalised and unnormalised cDNA, and there was no difference in the number of genes identified as being differentially expressed between blood and spleen for the normalised and unnormalised libraries. We conclude that normalised cDNA libraries are preferable for many applications of RNA-Seq and that these can also be used in quantitative gene expression studies.



2017 ◽  
Author(s):  
Djordje Djordjevic ◽  
Joshua Y. S. Tang ◽  
Yun Xin Chen ◽  
Shu Lun Shannon Kwan ◽  
Raymond W. K. Ling ◽  
...  

AbstractThere exists over 2.5 million publicly available gene expression samples across 101,000 data series in NCBI’s Gene Expression Omnibus (GEO) database. Due to the lack of the use of standardised ontology terms in GEO’s free text metadata to annotate the experimental type and sample type, this database remains difficult to harness computationally without significant manual intervention.In this work, we present an interactive R/Shiny tool called GEOracle that utilises text mining and machine learning techniques to automatically identify perturbation experiments, group treatment and control samples and perform differential expression. We present applications of GEOracle to discover conserved signalling pathway target genes and identify an organ specific gene regulatory network.GEOracle is effective in discovering perturbation gene targets in GEO by harnessing its free text metadata. Its effectiveness and applicability has been demonstrated by cross validation and two real-life case studies. It opens up new avenues to unlock the gene regulatory information embedded inside large biological databases such as GEO. GEOracle is available at https://github.com/VCCRI/GEOracle.



2017 ◽  
Vol 3 (4) ◽  
pp. 186
Author(s):  
Redi Aditama ◽  
Zulfikar Achmad Tanjung ◽  
Widyartini Made Sudania ◽  
Toni Liwang

<p class="Els-Abstract-text">RNA-seq using the Next Generation Sequencing (NGS) approach is a common technology to analyze large-scale RNA transcript data for gene expression studies. However, an appropriate bioinformatics tool is needed to analyze a large amount of transcriptomes data from RNA-seq experiment. The aim of this study was to construct a system that can be easily applied to analyze RNA-seq data. RNA-seq analysis tool as SMART-RDA was constructed in this study. It is a computational workflow based on Galaxy framework to be used for analyzing RNA-seq raw data into gene expression information. This workflow was adapted from a well-known Tuxedo Protocol for RNA-seq analysis with some modifications. Expression value from each transcriptome was quantitatively stated as Fragments Per Kilobase of exon per Million fragments (FPKM). RNA-seq data of sterile and fertile oil palm (Pisifera) pollens derived from Sequence Read Archive (SRA) NCBI were used to test this workflow in local facility Galaxy server. The results showed that differentially gene expression in pollens might be responsible for sterile and fertile characteristics in palm oil Pisifera.</p><p><strong>Keywords:</strong> FPKM; Galaxy workflow; Gene expression; RNA sequencing.</p>



2017 ◽  
Author(s):  
Alexander Lachmann ◽  
Denis Torre ◽  
Alexandra B. Keenan ◽  
Kathleen M. Jagodnik ◽  
Hyojin J. Lee ◽  
...  

RNA-sequencing (RNA-seq) is currently the leading technology for genome-wide transcript quantification. While the volume of RNA-seq data is rapidly increasing, the currently publicly available RNA-seq data is provided mostly in raw form, with small portions processed non- uniformly. This is mainly because the computational demand, particularly for the alignment step, is a significant barrier for global and integrative retrospective analyses. To address this challenge, we developed all RNA-seq and ChIP-seq sample and signature search (ARCHS4), a web resource that makes the majority of previously published RNA-seq data from human and mouse freely available at the gene count level. Such uniformly processed data enables easy integration for downstream analyses. For developing the ARCHS4 resource, all available FASTQ files from RNA-seq experiments were retrieved from the Gene Expression Omnibus (GEO) and aligned using a cloud-based infrastructure. In total 137,792 samples are accessible through ARCHS4 with 72,363 mouse and 65,429 human samples. Through efficient use of cloud resources and dockerized deployment of the sequencing pipeline, the alignment cost per sample is reduced to less than one cent. ARCHS4 is updated automatically by adding newly published samples to the database as they become available. Additionally, the ARCHS4 web interface provides intuitive exploration of the processed data through querying tools, interactive visualization, and gene landing pages that provide average expression across cell lines and tissues, top co-expressed genes, and predicted biological functions and protein-protein interactions for each gene based on prior knowledge combined with co-expression. Benchmarking the quality of these predictions, co-expression correlation data created from ARCHS4 outperforms co-expression data created from other major gene expression data repositories such as GTEx and CCLE.ARCHS4 is freely accessible at: http://amp.pharm.mssm.edu/archs4



Sign in / Sign up

Export Citation Format

Share Document