scholarly journals MoSBi: Automated signature mining for molecular stratification and subtyping

2021 ◽  
Author(s):  
Tim Daniel Rose ◽  
Thibault Bechtler ◽  
Octavia-Andreea Ciora ◽  
Kim Anh Lilian Le ◽  
Florian Molnar ◽  
...  

The improving access to increasing amounts of biomedical data provides completely new chances for advanced patient stratification and disease subtyping strategies. This requires computational tools that produce uniformly robust results across highly heterogeneous molecular data. Unsupervised machine learning methodologies are able to discover de-novo patterns in such data. Biclustering is especially suited by simultaneously identifying sample groups and corresponding feature sets across heterogeneous omics data. The performance of available biclustering algorithms heavily depends on individual parameterization and varies with their application. Here, we developed MoSBi (Molecular Signature identification using Biclustering), an automated multi-algorithm ensemble approach that integrates results utilizing an error model-supported similarity network. We evaluated the performance of MoSBi on transcriptomics, proteomics, and metabolomics data, as well as synthetic datasets covering various data properties. Profiting from multi-algorithm integration, MoSBi identified robust group and disease-specific signatures across all scenarios overcoming single algorithm specificities. Furthermore, we developed a scalable network-based visualization of bicluster communities that support biological hypothesis generation. MoSBi is available as an R package and web service to make automated biclustering analysis accessible for application in molecular sample stratification.

Animals ◽  
2021 ◽  
Vol 11 (3) ◽  
pp. 904
Author(s):  
Saif ur Rehman ◽  
Faiz-ul Hassan ◽  
Xier Luo ◽  
Zhipeng Li ◽  
Qingyou Liu

The buffalo was domesticated around 3000–6000 years ago and has substantial economic significance as a meat, dairy, and draught animal. The buffalo has remained underutilized in terms of the development of a well-annotated and assembled reference genome de novo. It is mandatory to explore the genetic architecture of a species to understand the biology that helps to manage its genetic variability, which is ultimately used for selective breeding and genomic selection. Morphological and molecular data have revealed that the swamp buffalo population has strong geographical genomic diversity with low gene flow but strong phenotypic consistency, while the river buffalo population has higher phenotypic diversity with a weak phylogeographic structure. The availability of recent high-quality reference genome and genotyping marker panels has invigorated many genome-based studies on evolutionary history, genetic diversity, functional elements, and performance traits. The increasing molecular knowledge syndicate with selective breeding should pave the way for genetic improvement in the climatic resilience, disease resistance, and production performance of water buffalo populations globally.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Javier Fernández-López ◽  
M. Teresa Telleria ◽  
Margarita Dueñas ◽  
Mara Laguna-Castro ◽  
Klaus Schliep ◽  
...  

AbstractThe use of different sources of evidence has been recommended in order to conduct species delimitation analyses to solve taxonomic issues. In this study, we use a maximum likelihood framework to combine morphological and molecular traits to study the case of Xylodon australis (Hymenochaetales, Basidiomycota) using the locate.yeti function from the phytools R package. Xylodon australis has been considered a single species distributed across Australia, New Zealand and Patagonia. Multi-locus phylogenetic analyses were conducted to unmask the actual diversity under X. australis as well as the kinship relations respect their relatives. To assess the taxonomic position of each clade, locate.yeti function was used to locate in a molecular phylogeny the X. australis type material for which no molecular data was available using morphological continuous traits. Two different species were distinguished under the X. australis name, one from Australia–New Zealand and other from Patagonia. In addition, a close relationship with Xylodon lenis, a species from the South East of Asia, was confirmed for the Patagonian clade. We discuss the implications of our results for the biogeographical history of this genus and we evaluate the potential of this method to be used with historical collections for which molecular data is not available.


2020 ◽  
Author(s):  
Maxim Ivanov ◽  
Albin Sandelin ◽  
Sebastian Marquardt

Abstract Background: The quality of gene annotation determines the interpretation of results obtained in transcriptomic studies. The growing number of genome sequence information calls for experimental and computational pipelines for de novo transcriptome annotation. Ideally, gene and transcript models should be called from a limited set of key experimental data. Results: We developed TranscriptomeReconstructoR, an R package which implements a pipeline for automated transcriptome annotation. It relies on integrating features from independent and complementary datasets: i) full-length RNA-seq for detection of splicing patterns and ii) high-throughput 5' and 3' tag sequencing data for accurate definition of gene borders. The pipeline can also take a nascent RNA-seq dataset to supplement the called gene model with transient transcripts.We reconstructed de novo the transcriptional landscape of wild type Arabidopsis thaliana seedlings as a proof-of-principle. A comparison to the existing transcriptome annotations revealed that our gene model is more accurate and comprehensive than the two most commonly used community gene models, TAIR10 and Araport11. In particular, we identify thousands of transient transcripts missing from the existing annotations. Our new annotation promises to improve the quality of A.thaliana genome research.Conclusions: Our proof-of-concept data suggest a cost-efficient strategy for rapid and accurate annotation of complex eukaryotic transcriptomes. We combine the choice of library preparation methods and sequencing platforms with the dedicated computational pipeline implemented in the TranscriptomeReconstructoR package. The pipeline only requires prior knowledge on the reference genomic DNA sequence, but not the transcriptome. The package seamlessly integrates with Bioconductor packages for downstream analysis.


2019 ◽  
Author(s):  
Cheynna Crowley ◽  
Yuchen Yang ◽  
Yunjiang Qiu ◽  
Benxia Hu ◽  
Armen Abnousi ◽  
...  

AbstractHi-C experiments have been widely adopted to study chromatin spatial organization, which plays an essential role in genome function. We have recently identified frequently interacting regions (FIREs) and found that they are closely associated with cell-type-specific gene regulation. However, computational tools for detecting FIREs from Hi-C data are still lacking. In this work, we present FIREcaller, a stand-alone, user-friendly R package for detecting FIREs from Hi-C data. FIREcaller takes raw Hi-C contact matrices as input, performs within-sample and cross-sample normalization, and outputs continuous FIRE scores, dichotomous FIREs, and super-FIREs. Applying FIREcaller to Hi-C data from various human tissues, we demonstrate that FIREs and super-FIREs identified, in a tissue-specific manner, are closely related to gene regulation, are enriched for enhancer-promoter (E-P) interactions, tend to overlap with regions exhibiting epigenomic signatures of cis-regulatory roles, and aid the interpretation or GWAS variants. The FIREcaller package is implemented in R and freely available at https://yunliweb.its.unc.edu/FIREcaller.Highlights– Frequently Interacting Regions (FIREs) can be used to identify tissue and cell-type-specific cis-regulatory regions.– An R software, FIREcaller, has been developed to identify FIREs and clustered FIREs into super-FIREs.


PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e10364
Author(s):  
Natalia I. Abramson ◽  
Fedor N. Golenishchev ◽  
Semen Yu. Bodrov ◽  
Olga V. Bondareva ◽  
Evgeny A. Genelt-Yanovskiy ◽  
...  

In this article, we present the nearly complete mitochondrial genome of the Subalpine Kashmir vole Hyperacrius fertilis (Arvicolinae, Cricetidae, Rodentia), assembled using data from Illumina next-generation sequencing (NGS) of the DNA from a century-old museum specimen. De novo assembly consisted of 16,341 bp and included all mitogenome protein-coding genes as well as 12S and 16S RNAs, tRNAs and D-loop. Using the alignment of protein-coding genes of 14 previously published Arvicolini tribe mitogenomes, seven Clethrionomyini mitogenomes, and also Ondatra and Dicrostonyx outgroups, we conducted phylogenetic reconstructions based on a dataset of 13 protein-coding genes (PCGs) under maximum likelihood and Bayesian inference. Phylogenetic analyses robustly supported the phylogenetic position of this species within the tribe Arvicolini. Among the Arvicolini, Hyperacrius represents one of the early-diverged lineages. This result of phylogenetic analysis altered the conventional view on phylogenetic relatedness between Hyperacrius and Alticola and prompted the revision of morphological characters underlying the former assumption. Morphological analysis performed here confirmed molecular data and provided additional evidence for taxonomic replacement of the genus Hyperacrius from the tribe Clethrionomyini to the tribe Arvicolini.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Xi Chen ◽  
Jinghua Gu ◽  
Andrew F. Neuwald ◽  
Leena Hilakivi-Clarke ◽  
Robert Clarke ◽  
...  
Keyword(s):  
De Novo ◽  

An amendment to this paper has been published and can be accessed via a link at the top of the paper.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Xi Chen ◽  
Jinghua Gu ◽  
Andrew F. Neuwald ◽  
Leena Hilakivi-Clarke ◽  
Robert Clarke ◽  
...  

Abstract Genome-wide transcription factor (TF) binding signal analyses reveal co-localization of TF binding sites, based on which cis-regulatory modules (CRMs) can be inferred. CRMs play a key role in understanding the cooperation of multiple TFs under specific conditions. However, the functions of CRMs and their effects on nearby gene transcription are highly dynamic and context-specific and therefore are challenging to characterize. BICORN (Bayesian Inference of COoperative Regulatory Network) builds a hierarchical Bayesian model and infers context-specific CRMs based on TF-gene binding events and gene expression data for a particular cell type. BICORN automatically searches for a list of candidate CRMs based on the input TF bindings at regulatory regions associated with genes of interest. Applying Gibbs sampling, BICORN iteratively estimates model parameters of CRMs, TF activities, and corresponding regulation on gene transcription, which it models as a sparse network of functional CRMs regulating target genes. The BICORN package is implemented in R (version 3.4 or later) and is publicly available on the CRAN server at https://cran.r-project.org/web/packages/BICORN/index.html.


2020 ◽  
Vol 36 (11) ◽  
pp. 3516-3521 ◽  
Author(s):  
Lixiang Zhang ◽  
Lin Lin ◽  
Jia Li

Abstract Motivation Cluster analysis is widely used to identify interesting subgroups in biomedical data. Since true class labels are unknown in the unsupervised setting, it is challenging to validate any cluster obtained computationally, an important problem barely addressed by the research community. Results We have developed a toolkit called covering point set (CPS) analysis to quantify uncertainty at the levels of individual clusters and overall partitions. Functions have been developed to effectively visualize the inherent variation in any cluster for data of high dimension, and provide more comprehensive view on potentially interesting subgroups in the data. Applying to three usage scenarios for biomedical data, we demonstrate that CPS analysis is more effective for evaluating uncertainty of clusters comparing to state-of-the-art measurements. We also showcase how to use CPS analysis to select data generation technologies or visualization methods. Availability and implementation The method is implemented in an R package called OTclust, available on CRAN. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 5 (6) ◽  
pp. e363 ◽  
Author(s):  
Giulia Barcia ◽  
Nicole Chemaly ◽  
Mathieu Kuchenbuch ◽  
Monika Eisermann ◽  
Stéphanie Gobin-Limballe ◽  
...  

ObjectiveTo report new sporadic cases and 1 family with epilepsy of infancy with migrating focal seizures (EIMFSs) due to KCNT1 gain-of-function and to assess therapies' efficacy including quinidine.MethodsWe reviewed the clinical, EEG, and molecular data of 17 new patients with EIMFS and KCNT1 mutations, in collaboration with the network of the French reference center for rare epilepsies.ResultsThe mean seizure onset age was 1 month (range: 1 hour to 4 months), and all children had focal motor seizures with autonomic signs and migrating ictal pattern on EEG. Three children also had infantile spasms and hypsarrhythmia. The identified KCNT1 variants clustered as “hot spots” on the C-terminal domain, and all mutations occurred de novo except the p.R398Q mutation inherited from the father with nocturnal frontal lobe epilepsy, present in 2 paternal uncles, one being asymptomatic and the other with single tonic-clonic seizure. In 1 patient with EIMFS, we identified the p.R1106Q mutation associated with Brugada syndrome and saw no abnormality in cardiac rhythm. Quinidine was well tolerated when administered to 2 and 4-year-old patients but did not reduce seizure frequency.ConclusionsThe majority of the KCNT1 mutations appear to cluster in hot spots essential for the channel activity. A same mutation can be linked to a spectrum of conditions ranging from EMFSI to asymptomatic carrier, even in the same family. None of the antiepileptic therapies displayed clinical efficacy, including quinidine in 2 patients.


Author(s):  
Kexin Huang ◽  
Cao Xiao ◽  
Lucas M Glass ◽  
Jimeng Sun

Abstract Motivation Drug–target interaction (DTI) prediction is a foundational task for in-silico drug discovery, which is costly and time-consuming due to the need of experimental search over large drug compound space. Recent years have witnessed promising progress for deep learning in DTI predictions. However, the following challenges are still open: (i) existing molecular representation learning approaches ignore the sub-structural nature of DTI, thus produce results that are less accurate and difficult to explain and (ii) existing methods focus on limited labeled data while ignoring the value of massive unlabeled molecular data. Results We propose a Molecular Interaction Transformer (MolTrans) to address these limitations via: (i) knowledge inspired sub-structural pattern mining algorithm and interaction modeling module for more accurate and interpretable DTI prediction and (ii) an augmented transformer encoder to better extract and capture the semantic relations among sub-structures extracted from massive unlabeled biomedical data. We evaluate MolTrans on real-world data and show it improved DTI prediction performance compared to state-of-the-art baselines. Availability and implementation The model scripts are available at https://github.com/kexinhuang12345/moltrans. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document