scholarly journals A compendium of monocyte transcriptome datasets to foster biomedical knowledge discovery

F1000Research ◽  
2016 ◽  
Vol 5 ◽  
pp. 291 ◽  
Author(s):  
Darawan Rinchai ◽  
Sabri Boughorbel ◽  
Scott Presnell ◽  
Charlie Quinn ◽  
Damien Chaussabel

Systems-scale profiling approaches have become widely used in translational research settings. The resulting accumulation of large-scale datasets in public repositories represents a critical opportunity to promote insight and foster knowledge discovery. However, resources that can serve as an interface between biomedical researchers and such vast and heterogeneous dataset collections are needed in order to fulfill this potential. Recently, we have developed an interactive data browsing and visualization web application, the Gene Expression Browser (GXB). This tool can be used to overlay deep molecular phenotyping data with rich contextual information about analytes, samples and studies along with ancillary clinical or immunological profiling data. In this note, we describe a curated compendium of 93 public datasets generated in the context of human monocyte immunological studies, representing a total of 4,516 transcriptome profiles. Datasets were uploaded to an instance of GXB along with study description and sample annotations. Study samples were arranged in different groups. Ranked gene lists were generated based on relevant group comparisons. This resource is publicly available online at http://monocyte.gxbsidra.org/dm3/landing.gsp.

F1000Research ◽  
2016 ◽  
Vol 5 ◽  
pp. 291 ◽  
Author(s):  
Darawan Rinchai ◽  
Sabri Boughorbel ◽  
Scott Presnell ◽  
Charlie Quinn ◽  
Damien Chaussabel

Systems-scale profiling approaches have become widely used in translational research settings. The resulting accumulation of large-scale datasets in public repositories represents a critical opportunity to promote insight and foster knowledge discovery. However, resources that can serve as an interface between biomedical researchers and such vast and heterogeneous dataset collections are needed in order to fulfill this potential. Recently, we have developed an interactive data browsing and visualization web application, the Gene Expression Browser (GXB). This tool can be used to overlay deep molecular phenotyping data with rich contextual information about analytes, samples and studies along with ancillary clinical or immunological profiling data. In this note, we describe a curated compendium of 93 public datasets generated in the context of human monocyte immunological studies, representing a total of 4,516 transcriptome profiles. Datasets were uploaded to an instance of GXB along with study description and sample annotations. Study samples were arranged in different groups. Ranked gene lists were generated based on relevant group comparisons. This resource is publicly available online athttp://monocyte.gxbsidra.org/dm3/landing.gsp.


F1000Research ◽  
2016 ◽  
Vol 5 ◽  
pp. 327 ◽  
Author(s):  
Jana Blazkova ◽  
Sabri Boughorbel ◽  
Scott Presnell ◽  
Charlie Quinn ◽  
Damien Chaussabel

Compendia of large-scale datasets available in public repositories provide an opportunity to identify and fill current gaps in biomedical knowledge. But first, these data need to be readily accessible to research investigators for interpretation. Here, we make available a collection of transcriptome datasets relevant to HIV infection. A total of 2717 unique transcriptional profiles distributed among 34 datasets were identified, retrieved from the NCBI Gene Expression Omnibus (GEO), and loaded in a custom web application, the Gene Expression Browser (GXB), designed for interactive query and visualization of integrated large-scale data. Multiple sample groupings and rank lists were created to facilitate dataset query and interpretation via this interface. Web links to customized graphical views can be generated by users and subsequently inserted in manuscripts reporting novel findings, such as discovery notes. The tool also enables browsing of a single gene across projects, which can provide new perspectives on the role of a given molecule across biological systems. This curated dataset collection is available at:http://hiv.gxbsidra.org/dm3/geneBrowser/list.


F1000Research ◽  
2016 ◽  
Vol 5 ◽  
pp. 414 ◽  
Author(s):  
Mahbuba Rahman ◽  
Sabri Boughorbel ◽  
Scott Presnell ◽  
Charlie Quinn ◽  
Chiara Cugno ◽  
...  

Compendia of large-scale datasets made available in public repositories provide an opportunity to identify and fill gaps in biomedical knowledge. But first, these data need to be made readily accessible to research investigators for interpretation. Here we make available a collection of transcriptome datasets to investigate the functional programming of human hematopoietic cells in early life. Thirty two datasets were retrieved from the NCBI Gene Expression Omnibus (GEO) and loaded in a custom web application called the Gene Expression Browser (GXB), which was designed for interactive query and visualization of integrated large-scale data. Quality control checks were performed. Multiple sample groupings and gene rank lists were created allowing users to reveal age-related differences in transcriptome profiles, changes in the gene expression of neonatal hematopoietic cells to a variety of immune stimulators and modulators, as well as during cell differentiation. Available demographic, clinical, and cell phenotypic information can be overlaid with the gene expression data and used to sort samples. Web links to customized graphical views can be generated and subsequently inserted in manuscripts to report novel findings. GXB also enables browsing of a single gene across projects, thereby providing new perspectives on age- and developmental stage-specific expression of a given gene across the human hematopoietic system. This dataset collection is available at: http://developmentalimmunology.gxbsidra.org/dm3/geneBrowser/list.


F1000Research ◽  
2016 ◽  
Vol 5 ◽  
pp. 305 ◽  
Author(s):  
Alexandra K. Marr ◽  
Sabri Boughorbel ◽  
Scott Presnell ◽  
Charlie Quinn ◽  
Damien Chaussabel ◽  
...  

Compendia of large-scale datasets made available in public repositories provide a precious opportunity to discover new biomedical phenomena and to fill gaps in our current knowledge. In order to foster novel insights it is necessary to ensure that these data are made readily accessible to research investigators in an interpretable format. Here we make a curated, public, collection of transcriptome datasets relevant to human placenta biology available for further analysis and interpretation via an interactive data browsing interface. We identified and retrieved a total of 24 datasets encompassing 759 transcriptome profiles associated with the development of the human placenta and associated pathologies from the NCBI Gene Expression Omnibus (GEO) and present them in a custom web-based application designed for interactive query and visualization of integrated large-scale datasets (http://placentalendocrinology.gxbsidra.org/dm3/landing.gsp). We also performed quality control checks using relevant biological markers. Multiple sample groupings and rank lists were subsequently created to facilitate data query and interpretation. Via this interface, users can create web-links to customized graphical views which may be inserted into manuscripts for further dissemination, or e-mailed to collaborators for discussion. The tool also enables users to browse a single gene across different projects, providing a mechanism for  developing new perspectives on the role of a molecule of interest across multiple biological states. The dataset collection we created here is available at: http://placentalendocrinology.gxbsidra.org/dm3.


F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 296 ◽  
Author(s):  
Jessica Roelands ◽  
Julie Decock ◽  
Sabri Boughorbel ◽  
Darawan Rinchai ◽  
Cristina Maccalli ◽  
...  

The increased application of high-throughput approaches in translational research has expanded the number of publicly available data repositories. Gathering additional valuable information contained in the datasets represents a crucial opportunity in the biomedical field. To facilitate and stimulate utilization of these datasets, we have recently developed an interactive data browsing and visualization web application, the Gene Expression Browser (GXB). In this note, we describe a curated compendium of 13 public datasets on human breast cancer, representing a total of 2142 transcriptome profiles. We classified the samples according to different immune based classification systems and integrated this information into the datasets. Annotated and harmonized datasets were uploaded to GXB. Study samples were categorized in different groups based on their immunologic tumor response profiles, intrinsic molecular subtypes and multiple clinical parameters. Ranked gene lists were generated based on relevant group comparisons. In this data note, we demonstrate the utility of GXB to evaluate the expression of a gene of interest, find differential gene expression between groups and investigate potential associations between variables with a specific focus on immunologic classification in breast cancer. This interactive resource is publicly available online at: http://breastcancer.gxbsidra.org/dm3/geneBrowser/list.


2018 ◽  
Vol 35 (14) ◽  
pp. 2489-2491 ◽  
Author(s):  
Tobias Rausch ◽  
Markus Hsi-Yang Fritz ◽  
Jan O Korbel ◽  
Vladimir Benes

Abstract Summary Harmonizing quality control (QC) of large-scale second and third-generation sequencing datasets is key for enabling downstream computational and biological analyses. We present Alfred, an efficient and versatile command-line application that computes multi-sample QC metrics in a read-group aware manner, across a wide variety of sequencing assays and technologies. In addition to standard QC metrics such as GC bias, base composition, insert size and sequencing coverage distributions it supports haplotype-aware and allele-specific feature counting and feature annotation. The versatility of Alfred allows for easy pipeline integration in high-throughput settings, including DNA sequencing facilities and large-scale research initiatives, enabling continuous monitoring of sequence data quality and characteristics across samples. Alfred supports haplo-tagging of BAM/CRAM files to conduct haplotype-resolved analyses in conjunction with a variety of next-generation sequencing based assays. Alfred’s companion web application enables interactive exploration of results and comparison to public datasets. Availability and implementation Alfred is open-source and freely available at https://tobiasrausch.com/alfred/. Supplementary information Supplementary data are available at Bioinformatics online.


F1000Research ◽  
2016 ◽  
Vol 5 ◽  
pp. 305 ◽  
Author(s):  
Alexandra K. Marr ◽  
Sabri Boughorbel ◽  
Scott Presnell ◽  
Charlie Quinn ◽  
Damien Chaussabel ◽  
...  

Compendia of large-scale datasets made available in public repositories provide a precious opportunity to discover new biomedical phenomena and to fill gaps in our current knowledge. In order to foster novel insights it is necessary to ensure that these data are made readily accessible to research investigators in an interpretable format. Here we make a curated, public, collection of transcriptome datasets relevant to human placenta biology available for further analysis and interpretation via an interactive data browsing interface. We identified and retrieved a total of 24 datasets encompassing 759 transcriptome profiles associated with the development of the human placenta and associated pathologies from the NCBI Gene Expression Omnibus (GEO) and present them in a custom web-based application designed for interactive query and visualization of integrated large-scale datasets (http://placentalendocrinology.gxbsidra.org/dm3/landing.gsp). We also performed quality control checks using relevant biological markers. Multiple sample groupings and rank lists were subsequently created to facilitate data query and interpretation. Via this interface, users can create web-links to customized graphical views which may be inserted into manuscripts for further dissemination, or e-mailed to collaborators for discussion. The tool also enables users to browse a single gene across different projects, providing a mechanism for  developing new perspectives on the role of a molecule of interest across multiple biological states. The dataset collection we created here is available at: http://placentalendocrinology.gxbsidra.org/dm3.


F1000Research ◽  
2018 ◽  
Vol 6 ◽  
pp. 296 ◽  
Author(s):  
Jessica Roelands ◽  
Julie Decock ◽  
Sabri Boughorbel ◽  
Darawan Rinchai ◽  
Cristina Maccalli ◽  
...  

The increased application of high-throughput approaches in translational research has expanded the number of publicly available data repositories. Gathering additional valuable information contained in the datasets represents a crucial opportunity in the biomedical field. To facilitate and stimulate utilization of these datasets, we have recently developed an interactive data browsing and visualization web application, the Gene Expression Browser (GXB). In this note, we describe a curated compendium of 13 public datasets on human breast cancer, representing a total of 2142 transcriptome profiles. We classified the samples according to different immune based classification systems and integrated this information into the datasets. Annotated and harmonized datasets were uploaded to GXB. Study samples were categorized in different groups based on their immunologic tumor response profiles, intrinsic molecular subtypes and multiple clinical parameters. Ranked gene lists were generated based on relevant group comparisons. In this data note, we demonstrate the utility of GXB to evaluate the expression of a gene of interest, find differential gene expression between groups and investigate potential associations between variables with a specific focus on immunologic classification in breast cancer. This interactive resource is publicly available online at: http://breastcancer.gxbsidra.org/dm3/geneBrowser/list.


2019 ◽  
Vol 7 ◽  
Author(s):  
Gabriel Muñoz ◽  
W. Daniel Kissling ◽  
E. Emiel van Loon

A considerable portion of primary biodiversity data is digitally locked inside published literature which is often stored as pdf files. Large-scale approaches to biodiversity science could benefit from retrieving this information and making it digitally accessible and machine-readable. Nonetheless, the amount and diversity of digitally published literature pose many challenges for knowledge discovery and retrieval. Text mining has been extensively used for data discovery tasks in large quantities of documents. However, text mining approaches for knowledge discovery and retrieval have been limited in biodiversity science compared to other disciplines. Here, we present a novel, open source text mining tool, the Biodiversity Observations Miner (BOM). This web application, written in R, allows the semi-automated discovery of punctual biodiversity observations (e.g. biotic interactions, functional or behavioural traits and natural history descriptions) associated with the scientific names present inside a corpus of scientific literature. Furthermore, BOM enable users the rapid screening of large quantities of literature based on word co-occurrences that match custom biodiversity dictionaries. This tool aims to increase the digital mobilisation of primary biodiversity data and is freely accessible via GitHub or through a web server.


2020 ◽  
Vol 19 (10) ◽  
pp. 1602-1618 ◽  
Author(s):  
Thibault Robin ◽  
Julien Mariethoz ◽  
Frédérique Lisacek

A key point in achieving accurate intact glycopeptide identification is the definition of the glycan composition file that is used to match experimental with theoretical masses by a glycoproteomics search engine. At present, these files are mainly built from searching the literature and/or querying data sources focused on posttranslational modifications. Most glycoproteomics search engines include a default composition file that is readily used when processing MS data. We introduce here a glycan composition visualizing and comparative tool associated with the GlyConnect database and called GlyConnect Compozitor. It offers a web interface through which the database can be queried to bring out contextual information relative to a set of glycan compositions. The tool takes advantage of compositions being related to one another through shared monosaccharide counts and outputs interactive graphs summarizing information searched in the database. These results provide a guide for selecting or deselecting compositions in a file in order to reflect the context of a study as closely as possible. They also confirm the consistency of a set of compositions based on the content of the GlyConnect database. As part of the tool collection of the Glycomics@ExPASy initiative, Compozitor is hosted at https://glyconnect.expasy.org/compozitor/ where it can be run as a web application. It is also directly accessible from the GlyConnect database.


Sign in / Sign up

Export Citation Format

Share Document