Comprehensive comparison of large-scale tissue expression datasets

For tissues to carry out their functions, they rely on the right proteins to be present. Several high-throughput technologies have been used to map out which proteins are expressed in which tissues; however, the data have not previously been systematically compared and integrated. We present a comprehensive evaluation of tissue expression data from a variety of experimental techniques and show that these agree surprisingly well with each other and with results from literature curation and text mining. We further found that most datasets support the assumed but not demonstrated distinction between tissue-specific and ubiquitous expression. By developing comparable confidence scores for all types of evidence, we show that it is possible to improve both quality and coverage by combining the datasets. To facilitate use and visualization of our work, we have developed the TISSUES resource ( http://tissues.jensenlab.org ), which makes all the scored and integrated data available through a single user-friendly web interface.

Download Full-text

Comprehensive comparison of large-scale tissue expression datasets

10.1101/010975 ◽

2014 ◽

Author(s):

Alberto Santos ◽

Kalliopi Tsafou ◽

Christian Stolte ◽

Sune Frankild ◽

Seán O'Donoghue ◽

...

Keyword(s):

Large Scale ◽

Comprehensive Evaluation ◽

Tissue Expression ◽

Expression Data ◽

Web Interface ◽

Comprehensive Comparison ◽

Single User ◽

The Right ◽

User Friendly ◽

Literature Curation

For tissues to carry out their functions, they rely on the right proteins to be present. Several high-throughput technologies have been used to map out which proteins are expressed in which tissues; however, the data have not previously been systematically compared and integrated. We present a comprehensive evaluation of tissue expression data from a variety of experimental techniques and show that these agree surprisingly well with each other and with results from literature curation and text mining. We further found that most datasets support the assumed but not demonstrated distinction between tissue-specific and ubiquitous expression. By developing comparable confidence scores for all types of evidence, we show that it is possible to improve both quality and coverage by combining the datasets. To facilitate use and visualization of our work, we have developed the TISSUES resource (http://tissues.jensenlab.org), which makes all the scored and integrated data available through a single user-friendly web interface.

Download Full-text

The Microbe Directory: An annotated, searchable inventory of microbes’ characteristics

Gates Open Research ◽

10.12688/gatesopenres.12772.1 ◽

2018 ◽

Vol 2 ◽

pp. 3 ◽

Cited By ~ 5

Author(s):

Heba Shaaban ◽

David A. Westfall ◽

Rawhi Mohammad ◽

David Danko ◽

Daniela Bezdan ◽

...

Keyword(s):

Biofilm Formation ◽

Large Scale ◽

Research Effort ◽

Gram Stain ◽

Web Interface ◽

Ongoing Effort ◽

Student Researchers ◽

User Friendly ◽

Optimal Ph ◽

Online Web

The Microbe Directory is a collective research effort to profile and annotate more than 7,500 unique microbial species from the MetaPhlAn2 database that includes bacteria, archaea, viruses, fungi, and protozoa. By collecting and summarizing data on various microbes’ characteristics, the project comprises a database that can be used downstream of large-scale metagenomic taxonomic analyses, allowing one to interpret and explore their taxonomic classifications to have a deeper understanding of the microbial ecosystem they are studying. Such characteristics include, but are not limited to: optimal pH, optimal temperature, Gram stain, biofilm-formation, spore-formation, antimicrobial resistance, and COGEM class risk rating. The database has been manually curated by trained student-researchers from Weill Cornell Medicine and CUNY—Hunter College, and its analysis remains an ongoing effort with open-source capabilities so others can contribute. Available in SQL, JSON, and CSV (i.e. Excel) formats, the Microbe Directory can be queried for the aforementioned parameters by a microorganism’s taxonomy. In addition to the raw database, The Microbe Directory has an online counterpart (https://microbe.directory/) that provides a user-friendly interface for storage, retrieval, and analysis into which other microbial database projects could be incorporated. The Microbe Directory was primarily designed to serve as a resource for researchers conducting metagenomic analyses, but its online web interface should also prove useful to any individual who wishes to learn more about any particular microbe.

Download Full-text

PhenomeXcan: Mapping the genome to the phenome through the transcriptome

Science Advances ◽

10.1126/sciadv.aba2083 ◽

2020 ◽

Vol 6 (37) ◽

pp. eaba2083 ◽

Cited By ~ 3

Author(s):

Milton Pividori ◽

Padma S. Rajagopal ◽

Alvaro Barbeira ◽

Yanyu Liang ◽

Owen Melia ◽

...

Keyword(s):

Complex Traits ◽

Large Scale ◽

Genome Wide Association Study ◽

Association Studies ◽

Gene List ◽

Tissue Expression ◽

Mendelian Inheritance ◽

Complex Data ◽

Causal Genes ◽

User Friendly

Large-scale genomic and transcriptomic initiatives offer unprecedented insight into complex traits, but clinical translation remains limited by variant-level associations without biological context and lack of analytic resources. Our resource, PhenomeXcan, synthesizes 8.87 million variants from genome-wide association study summary statistics on 4091 traits with transcriptomic data from 49 tissues in Genotype-Tissue Expression v8 into a gene-based, queryable platform including 22,515 genes. We developed a novel Bayesian colocalization method, fast enrichment estimation aided colocalization analysis (fastENLOC), to prioritize likely causal gene-trait associations. We successfully replicate associations from the phenome-wide association studies (PheWAS) catalog Online Mendelian Inheritance in Man, and an evidence-based curated gene list. Using PhenomeXcan results, we provide examples of novel and underreported genome-to-phenome associations, complex gene-trait clusters, shared causal genes between common and rare diseases via further integration of PhenomeXcan with ClinVar, and potential therapeutic targets. PhenomeXcan (phenomexcan.org) provides broad, user-friendly access to complex data for translational researchers.

Download Full-text

GEDI: a user-friendly toolbox for analysis of large-scale gene expression data

BMC Bioinformatics ◽

10.1186/1471-2105-8-457 ◽

2007 ◽

Vol 8 (1) ◽

pp. 457 ◽

Cited By ~ 8

Author(s):

André Fujita ◽

João R Sato ◽

Carlos E Ferreira ◽

Mari C Sogayar

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Large Scale ◽

Expression Data ◽

User Friendly

Download Full-text

PhenomeXcan: Mapping the genome to the phenome through the transcriptome

10.1101/833210 ◽

2019 ◽

Cited By ~ 6

Author(s):

Milton Pividori ◽

Padma S. Rajagopal ◽

Alvaro Barbeira ◽

Yanyu Liang ◽

Owen Melia ◽

...

Keyword(s):

Complex Traits ◽

Large Scale ◽

Target Genes ◽

Genome Wide Association Study ◽

Tissue Expression ◽

P Value ◽

Entire Genome ◽

User Friendly ◽

Biological Context ◽

Trait Associations

AbstractLarge-scale genomic and transcriptomic initiatives offer unprecedented ability to study the biology of complex traits and identify target genes for precision prevention or therapy. Translation to clinical contexts, however, has been slow and challenging due to lack of biological context for identified variant-level associations. Moreover, many translational researchers lack the computational or analytic infrastructures required to fully use these resources. We integrate genome-wide association study (GWAS) summary statistics from multiple publicly available sources and data from Genotype-Tissue Expression (GTEx) v8 using PrediXcan and provide a user-friendly platform for translational researchers based on state-of-the-art algorithms. We develop a novel Bayesian colocalization method, fastENLOC, to prioritize the most likely causal gene-trait associations. Our resource, PhenomeXcan, synthesizes 8.87 million variants from GWAS on 4,091 traits with transcriptome regulation data from 49 tissues in GTEx v8 into an innovative, gene-based resource including 22,255 genes. Across the entire genome/phenome space, we find 65,603 significant associations (Bonferroni-corrected p-value of 5.5 × 10−10), where 19,579 (29.8 percent) were colocalized (locus regional colocalization probability > 0.1). We successfully replicate associations from PheWAS Catalog (AUC=0.61) and OMIM (AUC=0.64). We provide examples of (a) finding novel and underreported genome-to-phenome associations, (b) exploring complex gene-trait clusters within PhenomeXcan, (c) studying phenome-to-phenome relationships between common and rare diseases via further integration of PhenomeXcan with ClinVar, and (d) evaluating potential therapeutic targets. PhenomeXcan (phenomexcan.org) broadens access to complex genomic and transcriptomic data and empowers translational researchers.One-Sentence SummaryPhenomeXcan is a gene-based resource of gene-trait associations with biological context that supports translational research.

Download Full-text

Computer-assisted initial diagnosis of rare diseases

PeerJ ◽

10.7717/peerj.2211 ◽

2016 ◽

Vol 4 ◽

pp. e2211 ◽

Cited By ~ 10

Author(s):

Rui Alves ◽

Marc Piñol ◽

Jordi Vilaplana ◽

Ivan Teixidó ◽

Joaquim Cruz ◽

...

Keyword(s):

Rare Disease ◽

Rare Diseases ◽

Large Scale ◽

Initial Diagnosis ◽

Computer Assisted ◽

Web Interface ◽

Genetic Origin ◽

Data Set ◽

User Friendly ◽

Current Database

Introduction.Most documented rare diseases have genetic origin. Because of their low individual frequency, an initial diagnosis based on phenotypic symptoms is not always easy, as practitioners might never have been exposed to patients suffering from the relevant disease. It is thus important to develop tools that facilitate symptom-based initial diagnosis of rare diseases by clinicians. In this work we aimed at developing a computational approach to aid in that initial diagnosis. We also aimed at implementing this approach in a user friendly web prototype. We call this tool Rare Disease Discovery. Finally, we also aimed at testing the performance of the prototype.Methods.Rare Disease Discovery uses the publicly available ORPHANET data set of association between rare diseases and their symptoms to automatically predict the most likely rare diseases based on a patient’s symptoms. We apply the method to retrospectively diagnose a cohort of 187 rare disease patients with confirmed diagnosis. Subsequently we test the precision, sensitivity, and global performance of the system under different scenarios by running large scale Monte Carlo simulations. All settings account for situations where absent and/or unrelated symptoms are considered in the diagnosis.Results.We find that this expert system has high diagnostic precision (≥80%) and sensitivity (≥99%), and is robust to both absent and unrelated symptoms.Discussion.The Rare Disease Discovery prediction engine appears to provide a fast and robust method for initial assisted differential diagnosis of rare diseases. We coupled this engine with a user-friendly web interface and it can be freely accessed athttp://disease-discovery.udl.cat/. The code and most current database for the whole project can be downloaded fromhttps://github.com/Wrrzag/DiseaseDiscovery/tree/no_classifiers.

Download Full-text

tspex: a tissue-specificity calculator for gene expression data

10.21203/rs.3.rs-51998/v1 ◽

2020 ◽

Author(s):

Antonio P. Camargo ◽

Adrielle A. Vasconcelos ◽

Mateus B. Fiamenghi ◽

Gonçalo A. G. Pereira ◽

Marcelo F. Carazzolle

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Web Application ◽

Tissue Specificity ◽

Source Code ◽

Command Line ◽

Expression Data ◽

Web Interface ◽

User Friendly ◽

Different Tissues

Abstract When comparing gene expression data of different tissues it is often interesting to identify tissue-specific genes or transcripts. Even though there are several metrics to measure tissue-specificity, a user-friendly tool that facilitates this analysis is not available yet. We present tspex, a software that allows easy computation of a comprehensive set of different tissue-specificity metrics from gene expression data. tspex can be used through a web interface, command-line or the Python API. Its package version also provides visualization functions that facilitate inspection of results. The documentation and the source code of tspex are available at https://apcamargo.github.io/tspex/ and the web application can be accessed at https://tspex.lge.ibi.unicamp.br/

Download Full-text

Enhancing Research Through the Use of the Genotype-Tissue Expression (GTEx) Database

Biological Research For Nursing ◽

10.1177/1099800421994186 ◽

2021 ◽

pp. 109980042199418

Author(s):

Ansley Grimes Stanfill ◽

Xueyuan Cao

Keyword(s):

Dopamine Receptor ◽

Large Scale ◽

Human Subjects ◽

Data Access ◽

Tissue Expression ◽

Receptor Type ◽

Expression Data ◽

Common Fund ◽

Omic Data

Despite a growing interest in multi-omic research, individual investigators may struggle to collect large-scale omic data, particularly from human subjects. Publicly available datasets can help to address this problem, including those sponsored by the NIH Common Fund, such as the Genotype-Tissue Expression (GTEx) database. This database contains genotype and expression data obtained from 54 non-diseased tissues in human subjects. But these data are often underutilized, because users may find the browsing tools to be counterintuitive or have difficulty navigating the procedures to request controlled data access. Furthermore, there is limited knowledge of these resources among nurse scientists interested in incorporating such information into their programs of research. This article outlines the procedures for using the GTEx database. Next, we provide one exemplar of using this resource to enhance existing research by investigating expression of dopamine receptor type 2 ( DRD2) across brain tissues in human subjects.

Download Full-text

EuRBPDB: a comprehensive resource for annotation, functional and oncological investigation of eukaryotic RNA binding proteins (RBPs)

10.1101/713164 ◽

2019 ◽

Author(s):

Jian-You Liao ◽

Bing Yang ◽

Yu-Chan Zhang ◽

Xiao-Juan Wang ◽

Yushan Ye ◽

...

Keyword(s):

Cancer Biology ◽

Large Scale ◽

Binding Proteins ◽

Rna Binding ◽

Rna Binding Proteins ◽

Web Interface ◽

Binding Domains ◽

Large Protein Family ◽

Almost All ◽

User Friendly

ABSTRACTRNA binding proteins (RBPs) are a large protein family that plays important roles at almost all levels of gene regulation through interacting with RNAs, and contributes to numerous biological processes. However, the complete list of eukaryotic RBPs including human is still unavailable. In this study, we systematically identified RBPs in 162 eukaryotic species based on both computational analysis of RNA binding domains (RBDs) and large-scale RNA binding proteomic (RBPome) data, and established a comprehensive eukaryotic RBP database, EuRBPDB (http://EuRBPDB.syshospital.org:8081). We identified a total of 311,571 RBPs with RBDs and 3,639 non-canonical RBPs without known RBDs. EuRBPDB provides detailed annotations for each RBP, including basic information and functional annotation. Moreover, we systematically investigated RBPs in the context of cancer biology based on published literatures and large-scale omics data. To facilitate the exploration of the clinical relevance of RBPs, we additionally designed a cancer web interface to systematically and interactively display the biological features of RBPs in various types of cancers. EuRBPDB has a user-friendly web interface with browse and search functions, as well as data downloading function. We expect that EuRBPDB will be a widely-used resource and platform for the RNA biology community.

Download Full-text