scholarly journals Metagenomics Strain Resolution on Assembly Graphs

2020 ◽  
Author(s):  
Christopher Quince ◽  
Sergey Nurk ◽  
Sebastien Raguideau ◽  
Robert James ◽  
Orkun S. Soyer ◽  
...  

AbstractWe introduce a novel bioinformatics pipeline, STrain Resolution ON assembly Graphs (STRONG), which identifies strains de novo, when multiple metagenome samples from the same community are available. STRONG performs coassembly, followed by binning into metagenome assembled genomes (MAGs), but uniquely it stores the coassembly graph prior to simplification of variants. This enables the subgraphs for individual single-copy core genes (SCGs) in each MAG to be extracted. It can then thread back reads from the samples to compute per sample coverages for the unitigs in these graphs. These graphs and their unitig coverages are then used in a Bayesian algorithm, BayesPaths, that determines the number of strains present, their sequences or haplotypes on the SCGs and their abundances in each of the samples.Our approach both avoids the ambiguities of read mapping and allows more of the information on co-occurrence of variants in reads to be utilised than if variants were treated independently, whilst at the same time exploiting the correlation of variants across samples that occurs when they are linked in the same strain. We compare STRONG to the current state of the art on synthetic communities and demonstrate that we can recover more strains, more accurately, and with a realistic estimate of uncertainty deriving from the variational Bayesian algorithm employed for the strain resolution. On a real anaerobic digestor time series we obtained strain-resolved SCGs for over 300 MAGs that for abundant community members match those observed from long Nanopore reads.

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Christopher Quince ◽  
Sergey Nurk ◽  
Sebastien Raguideau ◽  
Robert James ◽  
Orkun S. Soyer ◽  
...  

AbstractWe introduce STrain Resolution ON assembly Graphs (STRONG), which identifies strains de novo, from multiple metagenome samples. STRONG performs coassembly, and binning into metagenome assembled genomes (MAGs), and stores the coassembly graph prior to variant simplification. This enables the subgraphs and their unitig per-sample coverages, for individual single-copy core genes (SCGs) in each MAG, to be extracted. A Bayesian algorithm, BayesPaths, determines the number of strains present, their haplotypes or sequences on the SCGs, and abundances. STRONG is validated using synthetic communities and for a real anaerobic digestor time series generates haplotypes that match those observed from long Nanopore reads.


2021 ◽  
Vol 2021 (4) ◽  
Author(s):  
David S. Berman ◽  
Kwangeon Kim ◽  
Kanghoon Lee

Abstract We construct the classical double copy formalism for M-theory. This extends the current state of the art by including the three form potential of eleven dimensional supergravity along with the metric. The key for this extension is to construct a Kerr-Schild type Ansatz for exceptional field theory. This Kerr-Schild Ansatz then allows us to find the solutions of charged objects such as the membrane from a set of single copy fields. The exceptional field theory formalism then automatically produces the IIB Kerr-Schild ansatz allowing the construction of the single copy for the fields of IIB supergravity (with manifest SL(2) symmetry).


2018 ◽  
Author(s):  
Avantika Lal ◽  
Keli Liu ◽  
Robert Tibshirani ◽  
Arend Sidow ◽  
Daniele Ramazzotti

AbstractCancer is the result of mutagenic processes that can be inferred from tumor genomes by analyzing rate spectra of point mutations, or “mutational signatures”. Here we present SparseSignatures, a novel framework to extract signatures from somatic point mutation data. Our approach incorporates DNA replication error as a background, employs regularization to reduce noise in non-background signatures, uses cross-validation to identify the number of signatures, and is scalable to large datasets. We show that SparseSignatures outperforms current state-of-the-art methods on simulated data using standard metrics. We then apply SparseSignatures to whole genome sequences of 147 tumors from pancreatic cancer, discovering 8 signatures in addition to the background.


2021 ◽  
Author(s):  
Vijini Mallawaarachchi ◽  
Yu Lin

ABSTRACTMetagenomics binning has allowed us to study and characterize various genetic material of different species and gain insights into microbial communities. While existing binning tools bin metagenomics de novo assemblies, they do not make use of the assembly graphs that produce such assemblies. Here we propose MetaCoAG, a tool that utilizes assembly graphs with the composition and coverage information to bin metagenomic contigs. MetaCoAG uses single-copy marker genes to estimate the number of initial bins, assigns contigs into bins iteratively and adjusts the number of bins dynamically throughout the binning process. Experimental results on simulated and real datasets demonstrate that MetaCoAG significantly outperforms state-of-the-art binning tools, producing more high-quality bins than the second-best tool, with an average median F1-score of 88.40%. To the best of our knowledge, MetaCoAG is the first stand-alone binning tool to make direct use of the assembly graph information. MetaCoAG is available at https://github.com/Vini2/MetaCoAG.


Author(s):  
Michelle A Jusino ◽  
Mark T Banik ◽  
Jonathan M Palmer ◽  
Amy K Wray ◽  
Lei Xiao ◽  
...  

DNA analysis of predator feces using high-throughput amplicon sequencing (HTS) enhances our understanding of predator-prey interactions. However, conclusions drawn from this technique are constrained by biases that occur in multiple steps of the HTS workflow. To better characterize insectivorous animal diets, we used DNA from a diverse set of arthropods to assess PCR biases of commonly used and novel primer pairs for the mitochondrial gene, cytochrome oxidase C subunit 1 (CO1). We compared diversity recovered from HTS of bat guano samples using a commonly used primer pair “ZBJ” to results using the novel primer pair “ANML”. To parameterize our bioinformatics pipeline, we created an arthropod mock community consisting of single-copy (cloned) CO1 sequences. To examine biases associated with both PCR and HTS, mock community members were combined in equimolar amounts both pre- and post-PCR. We validated our system using guano from bats fed known diets and using composite samples of morphologically identified insects collected in pitfall traps. In PCR tests, the ANML primer pair amplified 58 of 59 arthropod taxa (98%) whereas ZBJ amplified 24 of 59 taxa (41%). Furthermore, in an HTS comparison of field-collected samples, the ANML primers detected nearly four-fold more arthropod taxa than the ZBJ primers. The additional arthropods detected include medically and economically relevant insect groups such as mosquitoes. Results revealed biases at both the PCR and sequencing levels, demonstrating the pitfalls associated with using HTS read numbers as proxies for abundance. The use of an arthropod mock community allowed for improved bioinformatics pipeline parameterization.


Author(s):  
Felix Stiehler ◽  
Marvin Steinborn ◽  
Stephan Scholz ◽  
Daniela Dey ◽  
Andreas P M Weber ◽  
...  

Abstract Motivation Current state-of-the-art tools for the de novo annotation of genes in eukaryotic genomes have to be specifically fitted for each species and still often produce annotations that can be improved much further. The fundamental algorithmic architecture for these tools has remained largely unchanged for about two decades, limiting learning capabilities. Here, we set out to improve the cross-species annotation of genes from DNA sequence alone with the help of deep learning. The goal is to eliminate the dependency on a closely related gene model while also improving the predictive quality in general with a fundamentally new architecture. Results We present Helixer, a framework for the development and usage of a cross-species deep learning model that improves significantly on performance and generalizability when compared to more traditional methods. We evaluate our approach by building a single vertebrate model for the base-wise annotation of 186 animal genomes and a separate land plant model for 51 plant genomes. Our predictions are shown to be much less sensitive to the length of the genome than those of a current state-of-the-art tool. We also present two novel post-processing techniques that each worked to further strengthen our annotations and show in-depth results of an RNA-Seq based comparison of our predictions. Our method does not yet produce comprehensive gene models but rather outputs base pair wise probabilities. Availability The source code of this work is available at https://github.com/weberlab-hhu/Helixer under the GNU General Public License v3.0. The trained models are available at https://doi.org/10.5281/zenodo.3974409 Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 17 (6) ◽  
pp. e1009119
Author(s):  
Avantika Lal ◽  
Keli Liu ◽  
Robert Tibshirani ◽  
Arend Sidow ◽  
Daniele Ramazzotti

Cancer is the result of mutagenic processes that can be inferred from tumor genomes by analyzing rate spectra of point mutations, or “mutational signatures”. Here we present SparseSignatures, a novel framework to extract signatures from somatic point mutation data. Our approach incorporates a user-specified background signature, employs regularization to reduce noise in non-background signatures, uses cross-validation to identify the number of signatures, and is scalable to large datasets. We show that SparseSignatures outperforms current state-of-the-art methods on simulated data using a variety of standard metrics. We then apply SparseSignatures to whole genome sequences of pancreatic and breast tumors, discovering well-differentiated signatures that are linked to known mutagenic mechanisms and are strongly associated with patient clinical features.


2017 ◽  
Author(s):  
Michelle A Jusino ◽  
Mark T Banik ◽  
Jonathan M Palmer ◽  
Amy K Wray ◽  
Lei Xiao ◽  
...  

DNA analysis of predator feces using high-throughput amplicon sequencing (HTS) enhances our understanding of predator-prey interactions. However, conclusions drawn from this technique are constrained by biases that occur in multiple steps of the HTS workflow. To better characterize insectivorous animal diets, we used DNA from a diverse set of arthropods to assess PCR biases of commonly used and novel primer pairs for the mitochondrial gene, cytochrome oxidase C subunit 1 (CO1). We compared diversity recovered from HTS of bat guano samples using a commonly used primer pair “ZBJ” to results using the novel primer pair “ANML”. To parameterize our bioinformatics pipeline, we created an arthropod mock community consisting of single-copy (cloned) CO1 sequences. To examine biases associated with both PCR and HTS, mock community members were combined in equimolar amounts both pre- and post-PCR. We validated our system using guano from bats fed known diets and using composite samples of morphologically identified insects collected in pitfall traps. In PCR tests, the ANML primer pair amplified 58 of 59 arthropod taxa (98%) whereas ZBJ amplified 24 of 59 taxa (41%). Furthermore, in an HTS comparison of field-collected samples, the ANML primers detected nearly four-fold more arthropod taxa than the ZBJ primers. The additional arthropods detected include medically and economically relevant insect groups such as mosquitoes. Results revealed biases at both the PCR and sequencing levels, demonstrating the pitfalls associated with using HTS read numbers as proxies for abundance. The use of an arthropod mock community allowed for improved bioinformatics pipeline parameterization.


2018 ◽  
Author(s):  
Samantha C Pendleton

AbstractContextOur insight into DNA is controlled through a process called sequencing. Until recently, it was only possible to sequence DNA into short strings called “reads”. Nanopore is a new sequencing technology to produce significantly longer reads. Using nanopore sequencing, a single molecule of DNA can be sequenced without the need for time consuming PCR amplification (polymerase chain reaction is a technique used in molecular biology to amplify a single copy or a few copies of a segment of DNA across several orders of magnitude).AimsMetagenomics is the study of genetic material recovered from environmental samples. A research team from IBERS (Institute of Biological, Environmental & Rural Sciences) at Aberystwyth University have sampled metagenomes from a coal mine in South Wales using the Nanopore MinION and given initial taxonomic (classification of organisms) summaries of the contents of the microbial community.MethodsUsing various new software aimed for metagenomic data, we are interested to discover how well current bioinformatics software works with the data-set. We will conduct analysis and research into how well these new state of the art software works with this new long read data and try out some recent new developments for such analysis.ResultsMost of the software we used worked very well: we gained understanding of the ACGT count and quality of the data. However some software for bioinformatics don’t seem to work with nanopore data. Furthermore, we can conclude that low quality nanopore data may actually be quite average.


1995 ◽  
Vol 38 (5) ◽  
pp. 1126-1142 ◽  
Author(s):  
Jeffrey W. Gilger

This paper is an introduction to behavioral genetics for researchers and practioners in language development and disorders. The specific aims are to illustrate some essential concepts and to show how behavioral genetic research can be applied to the language sciences. Past genetic research on language-related traits has tended to focus on simple etiology (i.e., the heritability or familiality of language skills). The current state of the art, however, suggests that great promise lies in addressing more complex questions through behavioral genetic paradigms. In terms of future goals it is suggested that: (a) more behavioral genetic work of all types should be done—including replications and expansions of preliminary studies already in print; (b) work should focus on fine-grained, theory-based phenotypes with research designs that can address complex questions in language development; and (c) work in this area should utilize a variety of samples and methods (e.g., twin and family samples, heritability and segregation analyses, linkage and association tests, etc.).


Sign in / Sign up

Export Citation Format

Share Document