scholarly journals Improving the value of public RNA-seq expression data by phenotype prediction

2017 ◽  
Author(s):  
Shannon E. Ellis ◽  
Leonardo Collado-Torres ◽  
Jeffrey T. Leek

AbstractBackgroundPublicly available genomic data are a valuable resource for studying normal human variation and disease, but these data are often not well labeled or annotated. The lack of phenotype information for public genomic data severely limits their utility for addressing targeted biological questions.ResultsWe develop an in silico phenotyping approach for predicting critical missing annotation directly from genomic measurements using, well-annotated genomic and phenotypic data produced by consortia like TCGA and GTEx as training data. We apply in silico phenotyping to a set of 70,000 RNA-seq samples we recently processed on a common pipeline as part of the recount2 project (https://jhubiostatistics.shinyapps.io/recount/). We use gene expression data to build and evaluate predictors for both biological phenotypes (sex, tissue, sample source) and experimental conditions (sequencing strategy). We demonstrate how these predictions can be used to study cross-sample properties of public genomic data, select genomic projects with specific characteristics, and perform downstream analyses using predicted phenotypes. The methods to perform phenotype prediction are available in the phenopredict R package (https://github.com/leekgroup/phenopredict) and the predictions for recount2 are available from the recount R package (https://bioconductor.org/packages/release/bioc/html/recount.html)ConclusionHaving leveraging massive public data sets to generate a well-phenotyped set of expression data for more than 70,000 human samples, expression data is available for use on a scale that was not previously feasible.

Author(s):  
Carlos Alberto Oliveira de Biagi ◽  
Ricardo Perecin Nociti ◽  
Breno Osvaldo Funicheli ◽  
Patrícia de Cássia Ruy ◽  
João Paulo Bianchi Ximenez ◽  
...  

AbstractSummaryFinding meaningful gene-gene associations and the main Transcription Factors (TFs) in co-expression networks is one of the most important challenges in gene expression data mining. CeTF is an R package that integrates the Partial Correlation with Information Theory (PCIT) and Regulatory Impact Factors (RIF) algorithms applied to gene expression data from microarray, RNA-seq, or single-cell RNA-seq platforms. This approach allows identifying the transcription factors most likely to regulate a given network in different biological systems — for example, regulation of gene pathways in tumor stromal cells and tumor cells of the same tumor. This pipeline can be easily integrated into the high-throughput analysis.AvailabilityCeTF is available as R package in Bioconductor (https://bioconductor.org/packages/CeTF), GitHub (https://github.com/cbiagii/CeTF) and as docker image (https://hub.docker.com/r/biagii/cetf). More information on how to use the package can be found in the Supplemental File 1.


2020 ◽  
Vol 79 (Suppl 1) ◽  
pp. 216.2-217
Author(s):  
D. Hartl ◽  
M. Keller ◽  
A. Klenk ◽  
M. Murphy ◽  
M. Martinic ◽  
...  

Background:To explore the full therapeutic spectrum of a drug it is crucial to consider its potential effectiveness in all diseases. Serendipitous clinical observations have often shown that approved drugs and those in development to be efficacious in indications different to those originally tested for. Traditional approaches to match a drug candidate with possible indications are mostly based on matching drug mechanistic knowledge with disease pathophysiology. Proof-of-concept trials or elaborate pre-clinical studies in animal models do not allow for a broad assessment due to high costs and slow progress. Gene expression changes in patients or animal models represent a good proxy to comprehensively assess both disease and drug effects. Furthermore, this data type can be integrated with a plethora of publicly available data.Objectives:Generation of a novel in silico framework to support the selection and expansion of potential indications which associate with a compound or approved drug. The framework was exemplified by the clinical compound cenerimod, a potent, selective, and orally active sphingosine-1-phosphate receptor 1 modulator (Piali et al., 2017).Methods:A total of ~13’000 public patient gene expression datasets from ~140 diseases were evaluated against cenerimod gene expression data generated in mouse disease models. To improve comparability of studies across platforms and species, computer algorithms (neural networks) were trained and employed to reduce noise within the data sets and improve signal. The predicted response to cenerimod for individual patients was contrasted against clinical patient characteristics.Results:The neural network algorithm efficiently reduced experimental noise and improved sensitivity in the gene expression data. The results predicted cenerimod to be efficacious in several auto-immune diseases foremost SLE. Additionally, focused analysis on individual patients rather than disease cohorts revealed potential determinants predictive of maximal clinical response, with the highest predicted clinical response for cenerimod in patients with severe inflammatory endotype and/or high SLE Disease Activity Index (SLEDAI).Conclusion:Combining preclinical compound data with the wealth of public disease gene expression data, provides great potential to support indication selection. The novel in silico framework identified SLE as a prime potential indication for cenerimod and supported the cenerimod phase 2b clinical trial in patients with SLE (CARE study,NCT03742037).References:[1]Piali, L., Birker-Robaczewska, M., Lescop, C., Froidevaux, S., Schmitz, N., Morrison, K., … Nayler, O. (2017). Cenerimod, a novel selective S1P1 receptor modulator with unique signaling properties. Pharmacology Research & Perspectives, 5(6), 1–12.https://doi.org/10.1002/prp2.370Disclosure of Interests:Dominik Hartl Shareholder of: Idorsia shares, Employee of: Idorsia employee, Marcel Keller Shareholder of: Idorsia options/shares, Employee of: Idorsia employee, Axel Klenk Shareholder of: Idorsia option/shares, Employee of: Idorsia employee, Mark Murphy Shareholder of: Idorsia shares and stock options, Employee of: Idorsia employee, Marianne Martinic Shareholder of: Idorsia options/shares, Employee of: Idorsia employee, Gabin Pierlot Shareholder of: Idorsia options/shares, Employee of: Idorsia employee, Peter Groenen Shareholder of: Idorsia options/shares, Employee of: Idorsia employee, Daniel Strasser Shareholder of: Idorsia options/shares, Employee of: Idorsia employee


2021 ◽  
Author(s):  
Quanshun Mei ◽  
Chuanke Fu ◽  
Jieling Li ◽  
Shuhong Zhao ◽  
Tao Xiang

AbstractSummaryGenetic analysis is a systematic and complex procedure in animal and plant breeding. With fast development of high-throughput genotyping techniques and algorithms, animal and plant breeding has entered into a genomic era. However, there is a lack of software, which can be used to process comprehensive genetic analyses, in the routine animal and plant breeding program. To make the whole genetic analysis in animal and plant breeding straightforward, we developed a powerful, robust and fast R package that includes genomic data format conversion, genomic data quality control and genotype imputation, breed composition analysis, pedigree tracing, analysis and visualization, pedigree-based and genomic-based relationship matrix construction, and genomic evaluation. In addition, to simplify the application of this package, we also developed a shiny toolkit for users.Availability and implementationblupADC is developed primarily in R with core functions written in C++. The development version is maintained at https://github.com/TXiang-lab/blupADC.Supplementary informationSupplementary data are available online


Author(s):  
Massimo Andreatta ◽  
Santiago J. Carmona

AbstractComputational tools for the integration of single-cell transcriptomics data are designed to correct batch effects between technical replicates or different technologies applied to the same population of cells. However, they have inherent limitations when applied to heterogeneous sets of data with moderate overlap in cell states or sub-types. STACAS is a package for the identification of integration anchors in the Seurat environment, optimized for the integration of datasets that share only a subset of cell types. We demonstrate that by i) correcting batch effects while preserving relevant biological variability across datasets, ii) filtering aberrant integration anchors with a quantitative distance measure, and iii) constructing optimal guide trees for integration, STACAS can accurately align scRNA-seq datasets composed of only partially overlapping cell populations. We anticipate that the algorithm will be a useful tool for the construction of comprehensive single-cell atlases by integration of the growing amount of single-cell data becoming available in public repositories.Code availabilityR package:https://github.com/carmonalab/STACASDocker image:https://hub.docker.com/repository/docker/mandrea1/stacas_demo


2020 ◽  
Author(s):  
Theodore G. Drivas ◽  
Anastasia Lucas ◽  
Marylyn D. Ritchie

SummaryGenomic studies increasingly integrate expression quantitative trait loci (eQTL) information into their analysis pipelines, but few tools exist for the visualization of colocalization between eQTL and GWAS results. To address this issue, we developed the intuitive R package eQTpLot, which takes as input GWAS and eQTL summary statistics to generate a series of plots visualizing colocalization, correlation, and enrichment between eQTL and GWAS signals for a given gene-trait pair. We believe eQTpLot will prove a useful tool for investigators seeking a convenient and customizable visualization of genomic data colocalization.Availability and Implementationthe eQTpLot R package and tutorial are available at https://github.com/RitchieLab/[email protected]


2020 ◽  
Vol 96 (8) ◽  
Author(s):  
D R Finn ◽  
J Yu ◽  
Z E Ilhan ◽  
V M C Fernandes ◽  
C R Penton ◽  
...  

ABSTRACT Niche is a fundamental concept in ecology. It integrates the sum of biotic and abiotic environmental requirements that determines a taxon's distribution. Microbiologists currently lack quantitative approaches to address niche-related hypotheses. We tested four approaches for the quantification of niche breadth and overlap of taxa in amplicon sequencing datasets, with the goal of determining generalists, specialists and environmental-dependent distributions of community members. We applied these indices to in silico training datasets first, and then to real human gut and desert biological soil crust (biocrust) case studies, assessing the agreement of the indices with previous findings. Implementation of each approach successfully identified a priori conditions within in silico training data, and we found that by including a limit of quantification based on species rank, one could identify taxa falsely classified as specialists because of their low, sparse counts. Analysis of the human gut study offered quantitative support for Bacilli, Gammaproteobacteria and Fusobacteria specialists enriched after bariatric surgery. We could quantitatively characterise differential niche distributions of cyanobacterial taxa with respect to precipitation gradients in biocrusts. We conclude that these approaches, made publicly available as an R package (MicroNiche), represent useful tools to assess microbial environment-taxon and taxon-taxon relationships in a quantitative manner.


2021 ◽  
Vol 17 (8) ◽  
pp. e1009263
Author(s):  
Elva María Novoa-del-Toro ◽  
Efrén Mezura-Montes ◽  
Matthieu Vignes ◽  
Morgane Térézol ◽  
Frédérique Magdinier ◽  
...  

The identification of subnetworks of interest—or active modules—by integrating biological networks with molecular profiles is a key resource to inform on the processes perturbed in different cellular conditions. We here propose MOGAMUN, a Multi-Objective Genetic Algorithm to identify active modules in MUltiplex biological Networks. MOGAMUN optimizes both the density of interactions and the scores of the nodes (e.g., their differential expression). We compare MOGAMUN with state-of-the-art methods, representative of different algorithms dedicated to the identification of active modules in single networks. MOGAMUN identifies dense and high-scoring modules that are also easier to interpret. In addition, to our knowledge, MOGAMUN is the first method able to use multiplex networks. Multiplex networks are composed of different layers of physical and functional relationships between genes and proteins. Each layer is associated to its own meaning, topology, and biases; the multiplex framework allows exploiting this diversity of biological networks. We applied MOGAMUN to identify cellular processes perturbed in Facio-Scapulo-Humeral muscular Dystrophy, by integrating RNA-seq expression data with a multiplex biological network. We identified different active modules of interest, thereby providing new angles for investigating the pathomechanisms of this disease. Availability: MOGAMUN is available at https://github.com/elvanov/MOGAMUN and as a Bioconductor package at https://bioconductor.org/packages/release/bioc/html/MOGAMUN.html. Contact: [email protected]


2021 ◽  
Author(s):  
Kai Kang ◽  
Caizhi David Huang ◽  
Yuanyuan Li ◽  
David M. Umbach ◽  
Leping Li

AbstractBackgroundBiological tissues consist of heterogenous populations of cells. Because gene expression patterns from bulk tissue samples reflect the contributions from all cells in the tissue, understanding the contribution of individual cell types to the overall gene expression in the tissue is fundamentally important. We recently developed a computational method, CDSeq, that can simultaneously estimate both sample-specific cell-type proportions and cell-type-specific gene expression profiles using only bulk RNA-Seq counts from multiple samples. Here we present an R implementation of CDSeq (CDSeqR) with significant performance improvement over the original implementation in MATLAB and with a new function to aid interpretation of deconvolution outcomes. The R package would be of interest for the broader R community.ResultWe developed a novel strategy to substantially improve computational efficiency in both speed and memory usage. In addition, we designed and implemented a new function for annotating CDSeq-estimated cell types using publicly available single-cell RNA sequencing (scRNA-seq) data (single-cell data from 20 major organs are included in the R package). This function allows users to readily interpret and visualize the CDSeq-estimated cell types. We carried out additional validations of the CDSeqR software with in silico and in vitro mixtures and with real experimental data including RNA-seq data from the Cancer Genome Atlas (TCGA) and The Genotype-Tissue Expression (GTEx) project.ConclusionsThe existing bulk RNA-seq repositories, such as TCGA and GTEx, provide enormous resources for better understanding changes in transcriptomics and human diseases. They are also potentially useful for studying cell-cell interactions in the tissue microenvironment. However, bulk level analyses neglect tissue heterogeneity and hinder investigation in a cell-type-specific fashion. The CDSeqR package can be viewed as providing in silico single-cell dissection of bulk measurements. It enables researchers to gain cell-type-specific information from bulk RNA-seq data.


2018 ◽  
Author(s):  
Jakob Russel ◽  
Jonathan Thorsen ◽  
Asker D. Brejnrod ◽  
Hans Bisgaard ◽  
Søren J. Sørensen ◽  
...  

AbstractDAtest is an R package for directly comparing different statistical methods for differential abundance and expression analysis on a dataset of interest; be it data from RNA-seq, proteomics, metabolomics or a microbial marker-gene survey. A myriad of statistical methods exists for conducting these analyses, and with this tool we give the analyst an empirical foundation for choosing a method suitable for a specific dataset. The package supports categorical and quantitative variables, paired/block experimental designs, and the inclusion of covariates. It is freely available at GitHub: https://github.com/Russel88/DAtest along with detailed instructions.


2020 ◽  
Author(s):  
Bernd Jagla ◽  
Vincent Rouilly ◽  
Michel Puceat ◽  
Milena Hasan

ABSTRACTMotivationSingle-cell RNA-sequencing (scRNAseq) experiments are becoming a standard tool for bench-scientists to explore the cellular diversity present in all tissues. On one hand, the data produced by scRNASeq is technically complex, with analytical workflows that are still very much an active field of bioinformatics research, and on the other hand, a wealth of biological background knowledge is often needed to guide the investigation. Therefore, there is an increasing need to develop applications geared towards bench-scientists to help them abstract the technical challenges of the analysis, so that they can focus on the Science at play. It is also expected that such applications should support closer collaboration between bioinformaticians and bench-scientists by providing reproducible science tools.ResultsWe present SCHNAPPs, a computer program designed to enable bench-scientists to autonomously explore and interpret single cell RNA-seq expression data and associated annotations. The Shiny-based application allows selecting genes and cells of interest, performing quality control, normalization, clustering, and differential expression analyses, applying standard workflows from Seurat (Stuart et al., 2019) or Scran (Lun et al., 2016) packages, and most of the common visualizations. An R-markdown report can be generated that tracks the modifications, and selected visualizations facilitating communication and reproducibility between bench-scientist and bioinformatician. The modular design of the tool allows to easily integrate new visualizations and analyses by bioinformaticians. We still recommend that a data analysis specialist oversees the analysis and interpretation.AvailabilityThe SCHNAPPs application, docker file, and documentation are available on GitHub: https://c3bi-pasteur-fr.github.io/UTechSCB-SCHNAPPs; Example contribution are available at the following GitHub site: https://github.com/baj12/SCHNAPPsContributions.


Sign in / Sign up

Export Citation Format

Share Document