scholarly journals MetaMarker: a pipeline for de novo discovery of novel metagenomic biomarkers

2019 ◽  
Vol 35 (19) ◽  
pp. 3812-3814
Author(s):  
Mohamad Koohi-Moghadam ◽  
Mitesh J Borad ◽  
Nhan L Tran ◽  
Kristin R Swanson ◽  
Lisa A Boardman ◽  
...  

Abstract Summary We present MetaMarker, a pipeline for discovering metagenomic biomarkers from whole-metagenome sequencing samples. Different from existing methods, MetaMarker is based on a de novo approach that does not require mapping raw reads to a reference database. We applied MetaMarker on whole-metagenome sequencing of colorectal cancer (CRC) stool samples from France to discover CRC specific metagenomic biomarkers. We showed robustness of the discovered biomarkers by validating in independent samples from Hong Kong, Austria, Germany and Denmark. We further demonstrated these biomarkers could be used to build a machine learning classifier for CRC prediction. Availability and implementation MetaMarker is freely available at https://bitbucket.org/mkoohim/metamarker under GPLv3 license. Supplementary information Supplementary data are available at Bioinformatics online.

2018 ◽  
Vol 35 (15) ◽  
pp. 2654-2656 ◽  
Author(s):  
Guoli Ji ◽  
Wenbin Ye ◽  
Yaru Su ◽  
Moliang Chen ◽  
Guangzao Huang ◽  
...  

Abstract Summary Alternative splicing (AS) is a well-established mechanism for increasing transcriptome and proteome diversity, however, detecting AS events and distinguishing among AS types in organisms without available reference genomes remains challenging. We developed a de novo approach called AStrap for AS analysis without using a reference genome. AStrap identifies AS events by extensive pair-wise alignments of transcript sequences and predicts AS types by a machine-learning model integrating more than 500 assembled features. We evaluated AStrap using collected AS events from reference genomes of rice and human as well as single-molecule real-time sequencing data from Amborella trichopoda. Results show that AStrap can identify much more AS events with comparable or higher accuracy than the competing method. AStrap also possesses a unique feature of predicting AS types, which achieves an overall accuracy of ∼0.87 for different species. Extensive evaluation of AStrap using different parameters, sample sizes and machine-learning models on different species also demonstrates the robustness and flexibility of AStrap. AStrap could be a valuable addition to the community for the study of AS in non-model organisms with limited genetic resources. Availability and implementation AStrap is available for download at https://github.com/BMILAB/AStrap. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Yuansheng Liu ◽  
Xiaocai Zhang ◽  
Quan Zou ◽  
Xiangxiang Zeng

Abstract Summary Removing duplicate and near-duplicate reads, generated by high-throughput sequencing technologies, is able to reduce computational resources in downstream applications. Here we develop minirmd, a de novo tool to remove duplicate reads via multiple rounds of clustering using different length of minimizer. Experiments demonstrate that minirmd removes more near-duplicate reads than existing clustering approaches and is faster than existing multi-core tools. To the best of our knowledge, minirmd is the first tool to remove near-duplicates on reverse-complementary strand. Availability and implementation https://github.com/yuansliu/minirmd. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Harry J Whitwell ◽  
Peter DiMaggio

Abstract Motivation Database searching of isotopically labelled PTMs can be problematic and we frequently find that only one, or neither in a heavy/light pair are assigned. In such cases, having a pair of MS/MS spectra that differ due to an isotopic label can assist in identifying the relevant m/z values that support the correct peptide annotation or can be used for de novo sequencing. Results We have developed an online application that identifies matching peaks and peaks differing by the appropriate mass shift (difference between heavy and light PTM) between two MS/MS spectra. Furthermore, the application predicts, from the exact-match peaks, the mass of their complementary ions and highlights these as high confidence matches between the two spectra. The result is a tool to visually compare two spectra, and downloadable peaks lists that can be used to support de novo sequencing. Availability and implementation HiLight-PTM is released using shinyapps.io by RStudio, and can be accessed from any internet browser at https://harrywhitwell.shinyapps.io/hilight-ptm/. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Xin Li ◽  
Haiyan Hu ◽  
Xiaoman Li

Abstract Motivation It is essential to study bacterial strains in environmental samples. Existing methods and tools often depend on known strains or known variations, cannot work on individual samples, not reliable, or not easy to use, etc. It is thus important to develop more user-friendly tools that can identify bacterial strains more accurately. Results We developed a new tool called mixtureS that can de novo identify bacterial strains from shotgun reads of a clonal or metagenomic sample, without prior knowledge about the strains and their variations. Tested on 243 simulated datasets and 195 experimental datasets, mixtureS reliably identified the strains, their numbers and their abundance. Compared with three tools, mixtureS showed better performance in almost all simulated datasets and the vast majority of experimental datasets. Availability and implementation The source code and tool mixtureS is available at http://www.cs.ucf.edu/˜xiaoman/mixtureS/. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Gaëtan Benoit ◽  
Mahendra Mariadassou ◽  
Stéphane Robin ◽  
Sophie Schbath ◽  
Pierre Peterlongo ◽  
...  

Abstract Motivation De novo comparative metagenomics is one of the most straightforward ways to analyze large sets of metagenomic data. Latest methods use the fraction of shared k-mers to estimate genomic similarity between read sets. However, those methods, while extremely efficient, are still limited by computational needs for practical usage outside of large computing facilities. Results We present SimkaMin, a quick comparative metagenomics tool with low disk and memory footprints, thanks to an efficient data subsampling scheme used to estimate Bray-Curtis and Jaccard dissimilarities. One billion metagenomic reads can be analyzed in <3 min, with tiny memory (1.09 GB) and disk (≈0.3 GB) requirements and without altering the quality of the downstream comparative analyses, making of SimkaMin a tool perfectly tailored for very large-scale metagenomic projects. Availability and implementation https://github.com/GATB/simka. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Vol 35 (15) ◽  
pp. 2663-2664 ◽  
Author(s):  
Carlos de Lannoy ◽  
Judith Risse ◽  
Dick de Ridder

Abstract Summary Nanopore sequencing is a novel development in nucleic acid analysis. As such, nanopore-sequencing hardware and software are updated frequently and extensively, which quickly renders peer-reviewed publications on analysis pipeline benchmarking efforts outdated. To provide the user community with a faster, more flexible alternative to peer-reviewed benchmark papers for de novo assembly tool performance we constructed poreTally, a comprehensive benchmarking tool. poreTally automatically assembles a given read set using several often-used assembly pipelines, analyzes the resulting assemblies for correctness and continuity, and finally generates a quality report, which can immediately be published on Github/Gitlab. Availability and implementation poreTally is available on Github at https://github.com/ cvdelannoy/poreTally, under an MIT license. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (24) ◽  
pp. 5301-5302 ◽  
Author(s):  
Marta Sampaio ◽  
Miguel Rocha ◽  
Hugo Oliveira ◽  
Oscar Dias

Abstract Summary The growing interest in phages as antibacterial agents has led to an increase in the number of sequenced phage genomes, increasing the need for intuitive bioinformatics tools for performing genome annotation. The identification of phage promoters is indeed the most difficult step of this process. Due to the lack of online tools for phage promoter prediction, we developed PhagePromoter, a tool for locating promoters in phage genomes, using machine learning methods. This is the first online tool for predicting promoters that uses phage promoter data and the first to identify both host and phage promoters with different motifs. Availability and implementation This tool was integrated in the Galaxy framework and it is available online at: https://bit.ly/2Dfebfv. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (15) ◽  
pp. 4269-4275 ◽  
Author(s):  
Haidong Yan ◽  
Aureliano Bombarely ◽  
Song Li

Abstract Motivation Transposable elements (TEs) classification is an essential step to decode their roles in genome evolution. With a large number of genomes from non-model species becoming available, accurate and efficient TE classification has emerged as a new challenge in genomic sequence analysis. Results We developed a novel tool, DeepTE, which classifies unknown TEs using convolutional neural networks (CNNs). DeepTE transferred sequences into input vectors based on k-mer counts. A tree structured classification process was used where eight models were trained to classify TEs into super families and orders. DeepTE also detected domains inside TEs to correct false classification. An additional model was trained to distinguish between non-TEs and TEs in plants. Given unclassified TEs of different species, DeepTE can classify TEs into seven orders, which include 15, 24 and 16 super families in plants, metazoans and fungi, respectively. In several benchmarking tests, DeepTE outperformed other existing tools for TE classification. In conclusion, DeepTE successfully leverages CNN for TE classification, and can be used to precisely classify TEs in newly sequenced eukaryotic genomes. Availability and implementation DeepTE is accessible at https://github.com/LiLabAtVT/DeepTE. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (21) ◽  
pp. 4281-4289 ◽  
Author(s):  
Jasmijn A Baaijens ◽  
Alexander Schönhuth

Abstract Motivation Haplotype-aware genome assembly plays an important role in genetics, medicine and various other disciplines, yet generation of haplotype-resolved de novo assemblies remains a major challenge. Beyond distinguishing between errors and true sequential variants, one needs to assign the true variants to the different genome copies. Recent work has pointed out that the enormous quantities of traditional NGS read data have been greatly underexploited in terms of haplotig computation so far, which reflects that methodology for reference independent haplotig computation has not yet reached maturity. Results We present POLYploid genome fitTEr (POLYTE) as a new approach to de novo generation of haplotigs for diploid and polyploid genomes of known ploidy. Our method follows an iterative scheme where in each iteration reads or contigs are joined, based on their interplay in terms of an underlying haplotype-aware overlap graph. Along the iterations, contigs grow while preserving their haplotype identity. Benchmarking experiments on both real and simulated data demonstrate that POLYTE establishes new standards in terms of error-free reconstruction of haplotype-specific sequence. As a consequence, POLYTE outperforms state-of-the-art approaches in various relevant aspects, where advantages become particularly distinct in polyploid settings. Availability and implementation POLYTE is freely available as part of the HaploConduct package at https://github.com/HaploConduct/HaploConduct, implemented in Python and C++. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (16) ◽  
pp. 4527-4529
Author(s):  
Ales Saska ◽  
David Tichy ◽  
Robert Moore ◽  
Achilles Rasquinha ◽  
Caner Akdas ◽  
...  

Abstract Summary Visualizing a network provides a concise and practical understanding of the information it represents. Open-source web-based libraries help accelerate the creation of biologically based networks and their use. ccNetViz is an open-source, high speed and lightweight JavaScript library for visualization of large and complex networks. It implements customization and analytical features for easy network interpretation. These features include edge and node animations, which illustrate the flow of information through a network as well as node statistics. Properties can be defined a priori or dynamically imported from models and simulations. ccNetViz is thus a network visualization library particularly suited for systems biology. Availability and implementation The ccNetViz library, demos and documentation are freely available at http://helikarlab.github.io/ccNetViz/. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document