scholarly journals Predicting promoters in phage genomes using PhagePromoter

2019 ◽  
Vol 35 (24) ◽  
pp. 5301-5302 ◽  
Author(s):  
Marta Sampaio ◽  
Miguel Rocha ◽  
Hugo Oliveira ◽  
Oscar Dias

Abstract Summary The growing interest in phages as antibacterial agents has led to an increase in the number of sequenced phage genomes, increasing the need for intuitive bioinformatics tools for performing genome annotation. The identification of phage promoters is indeed the most difficult step of this process. Due to the lack of online tools for phage promoter prediction, we developed PhagePromoter, a tool for locating promoters in phage genomes, using machine learning methods. This is the first online tool for predicting promoters that uses phage promoter data and the first to identify both host and phage promoters with different motifs. Availability and implementation This tool was integrated in the Galaxy framework and it is available online at: https://bit.ly/2Dfebfv. Supplementary information Supplementary data are available at Bioinformatics online.

2019 ◽  
Vol 35 (14) ◽  
pp. i31-i40 ◽  
Author(s):  
Erfan Sayyari ◽  
Ban Kawas ◽  
Siavash Mirarab

Abstract Motivation Learning associations of traits with the microbial composition of a set of samples is a fundamental goal in microbiome studies. Recently, machine learning methods have been explored for this goal, with some promise. However, in comparison to other fields, microbiome data are high-dimensional and not abundant; leading to a high-dimensional low-sample-size under-determined system. Moreover, microbiome data are often unbalanced and biased. Given such training data, machine learning methods often fail to perform a classification task with sufficient accuracy. Lack of signal is especially problematic when classes are represented in an unbalanced way in the training data; with some classes under-represented. The presence of inter-correlations among subsets of observations further compounds these issues. As a result, machine learning methods have had only limited success in predicting many traits from microbiome. Data augmentation consists of building synthetic samples and adding them to the training data and is a technique that has proved helpful for many machine learning tasks. Results In this paper, we propose a new data augmentation technique for classifying phenotypes based on the microbiome. Our algorithm, called TADA, uses available data and a statistical generative model to create new samples augmenting existing ones, addressing issues of low-sample-size. In generating new samples, TADA takes into account phylogenetic relationships between microbial species. On two real datasets, we show that adding these synthetic samples to the training set improves the accuracy of downstream classification, especially when the training data have an unbalanced representation of classes. Availability and implementation TADA is available at https://github.com/tada-alg/TADA. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (17) ◽  
pp. 4590-4598
Author(s):  
Robert Page ◽  
Ruriko Yoshida ◽  
Leon Zhang

Abstract Motivation Due to new technology for efficiently generating genome data, machine learning methods are urgently needed to analyze large sets of gene trees over the space of phylogenetic trees. However, the space of phylogenetic trees is not Euclidean, so ordinary machine learning methods cannot be directly applied. In 2019, Yoshida et al. introduced the notion of tropical principal component analysis (PCA), a statistical method for visualization and dimensionality reduction using a tropical polytope with a fixed number of vertices that minimizes the sum of tropical distances between each data point and its tropical projection. However, their work focused on the tropical projective space rather than the space of phylogenetic trees. We focus here on tropical PCA for dimension reduction and visualization over the space of phylogenetic trees. Results Our main results are 2-fold: (i) theoretical interpretations of the tropical principal components over the space of phylogenetic trees, namely, the existence of a tropical cell decomposition into regions of fixed tree topology; and (ii) the development of a stochastic optimization method to estimate tropical PCs over the space of phylogenetic trees using a Markov Chain Monte Carlo approach. This method performs well with simulation studies, and it is applied to three empirical datasets: Apicomplexa and African coelacanth genomes as well as sequences of hemagglutinin for influenza from New York. Availability and implementation Dataset: http://polytopes.net/Data.tar.gz. Code: http://polytopes.net/tropica_MCMC_codes.tar.gz. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Loïc Meunier ◽  
Denis Baurain ◽  
Luc Cornet

AbstractSummaryTo support small and large-scale genome annotation projects, we present AMAW (Automated MAKER2 Annotation Wrapper), a program devised to annotate non-model unicellular eukaryotic genomes by automating the acquisition of evidence data (transcripts and proteins) and facilitating the use of MAKER2, a widely adopted software suite for the annotation of eukaryotic genomes. Moreover, AMAW exists as a Singularity container recipe easy to deploy on a grid computer, thereby overcoming the tricky installation of MAKER2.AvailabilityAMAW is released both as a Singularity container recipe and a standalone Perl script (https://bitbucket.org/phylogeno/amaw/)[email protected] or [email protected] informationSupplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (10) ◽  
pp. 3257-3259 ◽  
Author(s):  
Haodong Xu ◽  
Ruifeng Hu ◽  
Peilin Jia ◽  
Zhongming Zhao

Abstract Motivation DNA N6-methyladenine (6 mA) has recently been found as an essential epigenetic modification, playing its roles in a variety of cellular processes. The abnormal status of DNA 6 mA modification has been reported in cancer and other disease. The annotation of 6 mA marks in genome is the first crucial step to explore the underlying molecular mechanisms including its regulatory roles. Results We present a novel online DNA 6 mA site tool, 6 mA-Finder, by incorporating seven sequence-derived information and three physicochemical-based features through recursive feature elimination strategy. Our multiple cross-validations indicate the promising accuracy and robustness of our model. 6 mA-Finder outperforms its peer tools in general and species-specific 6 mA site prediction, suggesting it can provide a useful resource for further experimental investigation of DNA 6 mA modification. Availability and implementation https://bioinfo.uth.edu/6mA_Finder. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 36 (7) ◽  
pp. 2229-2236 ◽  
Author(s):  
Fatima Zohra Smaili ◽  
Xin Gao ◽  
Robert Hoehndorf

Abstract Motivation Over the past years, significant resources have been invested into formalizing biomedical ontologies. Formal axioms in ontologies have been developed and used to detect and ensure ontology consistency, find unsatisfiable classes, improve interoperability, guide ontology extension through the application of axiom-based design patterns and encode domain background knowledge. The domain knowledge of biomedical ontologies may have also the potential to provide background knowledge for machine learning and predictive modelling. Results We use ontology-based machine learning methods to evaluate the contribution of formal axioms and ontology meta-data to the prediction of protein–protein interactions and gene–disease associations. We find that the background knowledge provided by the Gene Ontology and other ontologies significantly improves the performance of ontology-based prediction models through provision of domain-specific background knowledge. Furthermore, we find that the labels, synonyms and definitions in ontologies can also provide background knowledge that may be exploited for prediction. The axioms and meta-data of different ontologies contribute to improving data analysis in a context-specific manner. Our results have implications on the further development of formal knowledge bases and ontologies in the life sciences, in particular as machine learning methods are more frequently being applied. Our findings motivate the need for further development, and the systematic, application-driven evaluation and improvement, of formal axioms in ontologies. Availability and implementation https://github.com/bio-ontology-research-group/tsoe. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (19) ◽  
pp. 3812-3814
Author(s):  
Mohamad Koohi-Moghadam ◽  
Mitesh J Borad ◽  
Nhan L Tran ◽  
Kristin R Swanson ◽  
Lisa A Boardman ◽  
...  

Abstract Summary We present MetaMarker, a pipeline for discovering metagenomic biomarkers from whole-metagenome sequencing samples. Different from existing methods, MetaMarker is based on a de novo approach that does not require mapping raw reads to a reference database. We applied MetaMarker on whole-metagenome sequencing of colorectal cancer (CRC) stool samples from France to discover CRC specific metagenomic biomarkers. We showed robustness of the discovered biomarkers by validating in independent samples from Hong Kong, Austria, Germany and Denmark. We further demonstrated these biomarkers could be used to build a machine learning classifier for CRC prediction. Availability and implementation MetaMarker is freely available at https://bitbucket.org/mkoohim/metamarker under GPLv3 license. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
M.A. Basyrov ◽  
◽  
A.V. Akinshin ◽  
I.R. Makhmutov ◽  
Yu.D. Kantemirov ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document