scholarly journals debCAM: a bioconductor R package for fully unsupervised deconvolution of complex tissues

2020 ◽  
Vol 36 (12) ◽  
pp. 3927-3929 ◽  
Author(s):  
Lulu Chen ◽  
Chiung-Ting Wu ◽  
Niya Wang ◽  
David M Herrington ◽  
Robert Clarke ◽  
...  

Abstract Summary We develop a fully unsupervised deconvolution method to dissect complex tissues into molecularly distinctive tissue or cell subtypes based on bulk expression profiles. We implement an R package, deconvolution by Convex Analysis of Mixtures (debCAM) that can automatically detect tissue/cell-specific markers, determine the number of constituent subtypes, calculate subtype proportions in individual samples and estimate tissue/cell-specific expression profiles. We demonstrate the performance and biomedical utility of debCAM on gene expression, methylation, proteomics and imaging data. With enhanced data preprocessing and prior knowledge incorporation, debCAM software tool will allow biologists to perform a more comprehensive and unbiased characterization of tissue remodeling in many biomedical contexts. Availability and implementation http://bioconductor.org/packages/debCAM. Supplementary information Supplementary data are available at Bioinformatics online.

Author(s):  
Xiangfu Zhong ◽  
Albert Pla ◽  
Simon Rayner

Abstract Motivation The existence of complex subpopulations of miRNA isoforms, or isomiRs, is well established. While many tools exist for investigating isomiR populations, they differ in how they characterize an isomiR, making it difficult to compare results across different tools. Thus, there is a need for a more comprehensive and systematic standard for defining isomiRs. Such a standard would allow investigation of isomiR population structure in progressively more refined sub-populations, permitting the identification of more subtle changes between conditions and leading to an improved understanding of the processes that generate these differences. Results We developed Jasmine, a software tool that incorporates a hierarchal framework for characterizing isomiR populations. Jasmine is a Java application that can process raw read data in fastq/fasta format, or mapped reads in SAM format to produce a detailed characterization of isomiR populations. Thus, Jasmine can reveal structure not apparent in a standard miRNA-Seq analysis pipeline. Availability and implementation Jasmine is implemented in Java and R and freely available at bitbucket https://bitbucket.org/bipous/jasmine/src/master/. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (9) ◽  
pp. 2862-2871
Author(s):  
Chiung-Ting Wu ◽  
Yizhi Wang ◽  
Yinxue Wang ◽  
Timothy Ebbels ◽  
Ibrahim Karaman ◽  
...  

Abstract Motivation Liquid chromatography–mass spectrometry (LC-MS) is a standard method for proteomics and metabolomics analysis of biological samples. Unfortunately, it suffers from various changes in the retention times (RT) of the same compound in different samples, and these must be subsequently corrected (aligned) during data processing. Classic alignment methods such as in the popular XCMS package often assume a single time-warping function for each sample. Thus, the potentially varying RT drift for compounds with different masses in a sample is neglected in these methods. Moreover, the systematic change in RT drift across run order is often not considered by alignment algorithms. Therefore, these methods cannot effectively correct all misalignments. For a large-scale experiment involving many samples, the existence of misalignment becomes inevitable and concerning. Results Here, we describe an integrated reference-free profile alignment method, neighbor-wise compound-specific Graphical Time Warping (ncGTW), that can detect misaligned features and align profiles by leveraging expected RT drift structures and compound-specific warping functions. Specifically, ncGTW uses individualized warping functions for different compounds and assigns constraint edges on warping functions of neighboring samples. Validated with both realistic synthetic data and internal quality control samples, ncGTW applied to two large-scale metabolomics LC-MS datasets identifies many misaligned features and successfully realigns them. These features would otherwise be discarded or uncorrected using existing methods. The ncGTW software tool is developed currently as a plug-in to detect and realign misaligned features present in standard XCMS output. Availability and implementation An R package of ncGTW is freely available at Bioconductor and https://github.com/ChiungTingWu/ncGTW. A detailed user’s manual and a vignette are provided within the package. Supplementary information Supplementary data are available at Bioinformatics online.


2016 ◽  
Author(s):  
Musaddeque Ahmed ◽  
Richard C. Sallari ◽  
Haiyang Guo ◽  
Jason H. Moore ◽  
Housheng Hansen He ◽  
...  

AbstractSummaryGenetic predispositions to diseases populate the noncoding regions of the human genome. Delineating their functional basis can inform on the mechanisms contributing to disease development. However, this remains a challenge due to the poor characterization of the noncoding genome. Variant Set Enrichment (VSE) is a fast method to calculate the enrichment of a set of disease-associated variants across functionally annotated genomic regions, consequently highlighting the mechanisms important in the etiology of the disease studied.Availability and ImplementationVSE is implemented as an R package and can easily be implemented in any system with R. See supplementary information for [email protected]; [email protected]


Genome ◽  
2018 ◽  
Vol 61 (5) ◽  
pp. 337-347 ◽  
Author(s):  
Tuanhui Ren ◽  
Zhuanjian Li ◽  
Yu Zhou ◽  
Xuelian Liu ◽  
Ruili Han ◽  
...  

Chicken muscle quality is one of the most important factors determining the economic value of poultry, and muscle development and growth are affected by genetics, environment, and nutrition. However, little is known about the molecular regulatory mechanisms of long non-coding RNAs (lncRNAs) in chicken skeletal muscle development. Our study aimed to better understand muscle development in chickens and thereby improve meat quality. In this study, Ribo-Zero RNA-Seq was used to investigate differences in the expression profiles of muscle development related genes and associated pathways between Gushi (GS) and Arbor Acres (AA) chickens. We identified two muscle tissue specific expression lncRNAs. In addition, the target genes of these lncRNAs were significantly enriched in certain biological processes and molecular functions, as demonstrated by Gene Ontology (GO) analysis, and these target genes participate in five signaling pathway, as revealed by an analysis of the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. Taken together, these data suggest that different lncRNAs might be involved in regulating chicken muscle development and growth and provide new insight into the molecular mechanisms of lncRNAs.


2018 ◽  
Author(s):  
Hong-Dong Li ◽  
Yunpei Xu ◽  
Xiaoshu Zhu ◽  
Quan Liu ◽  
Gilbert S. Omenn ◽  
...  

ABSTRACTMotivationClustering analysis is essential for understanding complex biological data. In widely used methods such as hierarchical clustering (HC) and consensus clustering (CC), expression profiles of all genes are often used to assess similarity between samples for clustering. These methods output sample clusters, but are not able to provide information about which gene sets (functions) contribute most to the clustering. So interpretability of their results is limited. We hypothesized that integrating prior knowledge of annotated biological processes would not only achieve satisfying clustering performance but also, more importantly, enable potential biological interpretation of clusters.ResultsHere we report ClusterMine, a novel approach that identifies clusters by assessing functional similarity between samples through integrating known annotated gene sets, e.g., in Gene Ontology. In addition to outputting cluster membership of each sample as conventional approaches do, it outputs gene sets that are most likely to contribute to the clustering, a feature facilitating biological interpretation. Using three cancer datasets, two single cell RNA-sequencing based cell differentiation datasets, one cell cycle dataset and two datasets of cells of different tissue origins, we found that ClusterMine achieved similar or better clustering performance and that top-scored gene sets prioritized by ClusterMine are biologically relevant.Implementation and availabilityClusterMine is implemented as an R package and is freely available at: www.genemine.org/[email protected] InformationSupplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (9) ◽  
pp. 2938-2940
Author(s):  
Olivia Angelin-Bonnet ◽  
Patrick J Biggs ◽  
Samantha Baldwin ◽  
Susan Thomson ◽  
Matthieu Vignes

Abstract Summary We present sismonr, an R package for an integral generation and simulation of in silico biological systems. The package generates gene regulatory networks, which include protein-coding and non-coding genes along with different transcriptional and post-transcriptional regulations. The effect of genetic mutations on the system behaviour is accounted for via the simulation of genetically different in silico individuals. The ploidy of the system is not restricted to the usual haploid or diploid situations but can be defined by the user to higher ploidies. A choice of stochastic simulation algorithms allows us to simulate the expression profiles of the genes in the in silico system. We illustrate the use of sismonr by simulating the anthocyanin biosynthesis regulation pathway for three genetically distinct in silico plants. Availability and implementation The sismonr package is implemented in R and Julia and is publicly available on the CRAN repository (https://CRAN.R-project.org/package=sismonr). A detailed tutorial is available from GitHub at https://oliviaab.github.io/sismonr/. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (19) ◽  
pp. 3842-3845 ◽  
Author(s):  
Guangsheng Pei ◽  
Yulin Dai ◽  
Zhongming Zhao ◽  
Peilin Jia

Abstract Motivation Diseases and traits are under dynamic tissue-specific regulation. However, heterogeneous tissues are often collected in biomedical studies, which reduce the power in the identification of disease-associated variants and gene expression profiles. Results We present deTS, an R package, to conduct tissue-specific enrichment analysis with two built-in reference panels. Statistical methods are developed and implemented for detecting tissue-specific genes and for enrichment test of different forms of query data. Our applications using multi-trait genome-wide association studies data and cancer expression data showed that deTS could effectively identify the most relevant tissues for each query trait or sample, providing insights for future studies. Availability and implementation https://github.com/bsml320/deTS and CRAN https://cran.r-project.org/web/packages/deTS/ Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Farancesco Napolitano ◽  
Diego Carrella ◽  
Xin Gao ◽  
Diego di Bernardo

Abstract Summary Pathway-based expression profiles allow for high-level interpretation of transcriptomic data and systematic comparison of dysregulated cellular programs. We have previously demonstrated the efficacy of pathway-based approaches with two different applications: the drug set enrichment analysis and the Gene2drug analysis. Here, we present a software tool that allows to easily convert gene-based profiles to pathway-based profiles and analyze them within the popular R framework. We also provide pre-computed profiles derived from the original Connectivity Map and its next generation release, i.e. the LINCS database. Availability and implementation The tool is implemented as the R/Bioconductor package gep2pep and can be freely downloaded from https://bioconductor.org/packages/gep2pep. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (11) ◽  
pp. 3618-3619 ◽  
Author(s):  
Pere Ràfols ◽  
Bram Heijs ◽  
Esteban del Castillo ◽  
Oscar Yanes ◽  
Liam A McDonnell ◽  
...  

Abstract Summary Mass spectrometry imaging (MSI) can reveal biochemical information directly from a tissue section. MSI generates a large quantity of complex spectral data which is still challenging to translate into relevant biochemical information. Here, we present rMSIproc, an open-source R package that implements a full data processing workflow for MSI experiments performed using TOF or FT-based mass spectrometers. The package provides a novel strategy for spectral alignment and recalibration, which allows to process multiple datasets simultaneously. This enables to perform a confident statistical analysis with multiple datasets from one or several experiments. rMSIproc is designed to work with files larger than the computer memory capacity and the algorithms are implemented using a multi-threading strategy. rMSIproc is a powerful tool able to take full advantage of modern computer systems to completely develop the whole MSI potential. Availability and implementation rMSIproc is freely available at https://github.com/prafols/rMSIproc. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
David Gerard ◽  
Luís Felipe Ventorim Ferrão

Abstract Motivation Empirical Bayes techniques to genotype polyploid organisms usually either (i) assume technical artifacts are known a priori or (ii) estimate technical artifacts simultaneously with the prior genotype distribution. Case (i) is unappealing as it places the onus on the researcher to estimate these artifacts, or to ensure that there are no systematic biases in the data. However, as we demonstrate with a few empirical examples, case (ii) makes choosing the class of prior genotype distributions extremely important. Choosing a class that is either too flexible or too restrictive results in poor genotyping performance. Results We propose two classes of prior genotype distributions that are of intermediate levels of flexibility: the class of proportional normal distributions and the class of unimodal distributions. We provide a complete characterization of and optimization details for the class of unimodal distributions. We demonstrate, using both simulated and real data, that using these classes results in superior genotyping performance. Availability and implementation Genotyping methods that use these priors are implemented in the updog R package available on the Comprehensive R Archive Network: https://cran.r-project.org/package=updog. All code needed to reproduce the results of this paper is available on GitHub: https://github.com/dcgerard/reproduce\_prior\_sims. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document