scholarly journals Shotgun metagenomics of soil invertebrate communities reflects taxonomy, biomass and reference genome properties

2021 ◽  
Author(s):  
Alexandra Schmidt ◽  
Clement Schneider ◽  
Peter Decker ◽  
Karin Hohberg ◽  
Joerg Roembke ◽  
...  

Metagenomics - shotgun sequencing of all DNA fragments from a community DNA extract - is routinely used to describe the composition, structure and function of microorganism communities. Advances in DNA sequencing and the availability of genome databases increasingly allow the use of shotgun metagenomics on eukaryotic communities. Metagenomics offers major advances in the recovery of biomass relationships in a sample, in comparison to taxonomic marker gene based approaches (metabarcoding). However, little is known about the factors which influence metagenomics data from eukaryotic communities, such as differences among organism groups, the properties of reference genomes and genome assemblies. We evaluated how shotgun metagenomics records composition and biomass in artificial soil invertebrate communities. We generated mock communities of controlled biomass ratios from 28 species from all major soil mesofauna groups: mites, springtails, nematodes, tardigrades and potworms. We shotgun-sequenced these communities and taxonomically assigned them with a database of over 270 soil invertebrate genomes. We recovered 90% of the species, and observed relatively high false positive detection rates. We found strong differences in reads assigned to different taxa, with some groups (e.g. springtails) consistently attracting more hits than others (e.g. enchytraeids). Original biomass could be predicted from read counts after considering these taxon-specific differences. Species with larger genomes, and with more complete assemblies consistently attracted more reads than species with smaller genomes. The GC content of the genome assemblies had no effect on the biomass-read relationships. The results show considerable differences in taxon recovery and taxon specificity of biomass recovery from metagenomic sequence data. The properties of reference genomes and genome assemblies also influence biomass recovery, and they should be considered in metagenomic studies of eukaryotes. We provide a roadmap for investigating factors which influence metagenomics-based eukaryotic community reconstructions. Understanding these factors is timely as accessibility of DNA sequencing, and momentum for reference genomes projects show a future where the taxonomic assignment of DNA from any community sample becomes a reality.

Author(s):  
Alexandra Schmidt ◽  
Clément Schneider ◽  
Peter Decker ◽  
Karin Hohberg ◽  
Jörg Römbke ◽  
...  

Metagenomics - shotgun sequencing of all DNA fragments from a community DNA extract - is routinely used to describe the composition, structure and function of microorganism communities. Advances in DNA sequencing and the availability of genome databases increasingly allow the use of shotgun metagenomics on eukaryotic communities. Metagenomics offers major advances in the recovery of biomass relationships, in comparison to taxonomic marker gene based approaches (metabarcoding). However, little is known about the factors that influence metagenomics data from eukaryotic communities, such as differences among organism groups, properties of reference genomes and genome assemblies. We evaluated how shotgun metagenomics records composition and biomass in artificial soil invertebrate communities. We generated mock communities of controlled biomass ratios from 28 species from all major soil mesofauna groups: mites, springtails, nematodes, tardigrades and potworms. We shotgun-sequenced these communities and taxonomically assigned them with a database of over 270 soil invertebrate genomes. We recovered 90% of the species, and observed relatively high false positive detection rates. We found strong differences in reads assigned to different taxa, with some groups consistently attracting more hits than others. Biomass could be predicted from read counts after considering taxon-specific differences. Larger genomes more complete assemblies consistently attracted more reads than genomes. The GC content of the genome assemblies had no effect on the biomass-read relationships. The results show considerable differences in taxon recovery and taxon specificity of biomass recovery from metagenomic sequence data. Properties of reference genomes and genome assemblies also influence biomass recovery, and they should be considered in metagenomic studies of eukaryotes. We provide a roadmap for investigating factors which influence metagenomics-based eukaryotic community reconstructions. Understanding these factors is timely as accessibility of DNA sequencing, and momentum for reference genomes projects show a future where the taxonomic assignment of DNA from any community sample becomes a reality.


Microbiome ◽  
2021 ◽  
Vol 9 (1) ◽  
Author(s):  
Lars Snipen ◽  
Inga-Leena Angell ◽  
Torbjørn Rognes ◽  
Knut Rudi

Abstract Background Studies of shifts in microbial community composition has many applications. For studies at species or subspecies levels, the 16S amplicon sequencing lacks resolution and is often replaced by full shotgun sequencing. Due to higher costs, this restricts the number of samples sequenced. As an alternative to a full shotgun sequencing we have investigated the use of Reduced Metagenome Sequencing (RMS) to estimate the composition of a microbial community. This involves the use of double-digested restriction-associated DNA sequencing, which means only a smaller fraction of the genomes are sequenced. The read sets obtained by this approach have properties different from both amplicon and shotgun data, and analysis pipelines for both can either not be used at all or not explore the full potential of RMS data. Results We suggest a procedure for analyzing such data, based on fragment clustering and the use of a constrained ordinary least square de-convolution for estimating the relative abundance of all community members. Mock community datasets show the potential to clearly separate strains even when the 16S is 100% identical, and genome-wide differences is < 0.02, indicating RMS has a very high resolution. From a simulation study, we compare RMS to shotgun sequencing and show that we get improved abundance estimates when the community has many very closely related genomes. From a real dataset of infant guts, we show that RMS is capable of detecting a strain diversity gradient for Escherichia coli across time. Conclusion We find that RMS is a good alternative to either metabarcoding or shotgun sequencing when it comes to resolving microbial communities at the strain level. Like shotgun metagenomics, it requires a good database of reference genomes and is well suited for studies of the human gut or other communities where many reference genomes exist. A data analysis pipeline is offered, as an R package at https://github.com/larssnip/microRMS.


2019 ◽  
Vol 35 (17) ◽  
pp. 2932-2940 ◽  
Author(s):  
Subrata Saha ◽  
Jethro Johnson ◽  
Soumitra Pal ◽  
George M Weinstock ◽  
Sanguthevar Rajasekaran

Abstract Motivation Metagenomics is the study of genetic materials directly sampled from natural habitats. It has the potential to reveal previously hidden diversity of microscopic life largely due to the existence of highly parallel and low-cost next-generation sequencing technology. Conventional approaches align metagenomic reads onto known reference genomes to identify microbes in the sample. Since such a collection of reference genomes is very large, the approach often needs high-end computing machines with large memory which is not often available to researchers. Alternative approaches follow an alignment-free methodology where the presence of a microbe is predicted using the information about the unique k-mers present in the microbial genomes. However, such approaches suffer from high false positives due to trading off the value of k with the computational resources. In this article, we propose a highly efficient metagenomic sequence classification (MSC) algorithm that is a hybrid of both approaches. Instead of aligning reads to the full genomes, MSC aligns reads onto a set of carefully chosen, shorter and highly discriminating model sequences built from the unique k-mers of each of the reference sequences. Results Microbiome researchers are generally interested in two objectives of a taxonomic classifier: (i) to detect prevalence, i.e. the taxa present in a sample, and (ii) to estimate their relative abundances. MSC is primarily designed to detect prevalence and experimental results show that MSC is indeed a more effective and efficient algorithm compared to the other state-of-the-art algorithms in terms of accuracy, memory and runtime. Moreover, MSC outputs an approximate estimate of the abundances. Availability and implementation The implementations are freely available for non-commercial purposes. They can be downloaded from https://drive.google.com/open?id=1XirkAamkQ3ltWvI1W1igYQFusp9DHtVl.


BMC Genomics ◽  
2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Gokhan Yavas ◽  
Huixiao Hong ◽  
Wenming Xiao

Abstract Background Accurate de novo genome assembly has become reality with the advancements in sequencing technology. With the ever-increasing number of de novo genome assembly tools, assessing the quality of assemblies has become of great importance in genome research. Although many quality metrics have been proposed and software tools for calculating those metrics have been developed, the existing tools do not produce a unified measure to reflect the overall quality of an assembly. Results To address this issue, we developed the de novo Assembly Quality Evaluation Tool (dnAQET) that generates a unified metric for benchmarking the quality assessment of assemblies. Our framework first calculates individual quality scores for the scaffolds/contigs of an assembly by aligning them to a reference genome. Next, it computes a quality score for the assembly using its overall reference genome coverage, the quality score distribution of its scaffolds and the redundancy identified in it. Using synthetic assemblies randomly generated from the latest human genome build, various builds of the reference genomes for five organisms and six de novo assemblies for sample NA24385, we tested dnAQET to assess its capability for benchmarking quality evaluation of genome assemblies. For synthetic data, our quality score increased with decreasing number of misassemblies and redundancy and increasing average contig length and coverage, as expected. For genome builds, dnAQET quality score calculated for a more recent reference genome was better than the score for an older version. To compare with some of the most frequently used measures, 13 other quality measures were calculated. The quality score from dnAQET was found to be better than all other measures in terms of consistency with the known quality of the reference genomes, indicating that dnAQET is reliable for benchmarking quality assessment of de novo genome assemblies. Conclusions The dnAQET is a scalable framework designed to evaluate a de novo genome assembly based on the aggregated quality of its scaffolds (or contigs). Our results demonstrated that dnAQET quality score is reliable for benchmarking quality assessment of genome assemblies. The dnQAET can help researchers to identify the most suitable assembly tools and to select high quality assemblies generated.


2006 ◽  
Vol 42 ◽  
pp. S150-S156 ◽  
Author(s):  
M.A. Callaham ◽  
D.D. Richter ◽  
D.C. Coleman ◽  
M. Hofmockel

Ecography ◽  
2017 ◽  
Vol 41 (7) ◽  
pp. 1135-1146 ◽  
Author(s):  
Connor R. Fitzpatrick ◽  
Anna V. Mikhailitchenko ◽  
Daniel N. Anstett ◽  
Marc T. J. Johnson

Author(s):  
Arang Rhie ◽  
Brian P. Walenz ◽  
Sergey Koren ◽  
Adam M. Phillippy

AbstractRecent long-read assemblies often exceed the quality and completeness of available reference genomes, making validation challenging. Here we present Merqury, a novel tool for reference-free assembly evaluation based on efficient k-mer set operations. By comparing k-mers in a de novo assembly to those found in unassembled high-accuracy reads, Merqury estimates base-level accuracy and completeness. For trios, Merqury can also evaluate haplotype-specific accuracy, completeness, phase block continuity, and switch errors. Multiple visualizations, such as k-mer spectrum plots, can be generated for evaluation. We demonstrate on both human and plant genomes that Merqury is a fast and robust method for assembly validation.Availability of data and materialProject name: MerquryProject home page: https://github.com/marbl/merqury, https://github.com/marbl/merylArchived version: https://github.com/marbl/merqury/releases/tag/v1.0Operating system(s): Platform independentProgramming language: C++, Java, PerlOther requirements: gcc 4.8 or higher, java 1.6 or higherLicense: Public domain (see https://github.com/marbl/merqury/blob/master/README.license) Any restrictions to use by non-academics: No restrictions applied


2016 ◽  
Author(s):  
Jia-Xing Yue ◽  
Jing Li ◽  
Louise Aigrain ◽  
Johan Hallin ◽  
Karl Persson ◽  
...  

AbstractStructural rearrangements have long been recognized as an important source of genetic variation with implications in phenotypic diversity and disease, yet their evolutionary dynamics are difficult to characterize with short-read sequencing. Here, we report long-read sequencing for 12 strains representing major subpopulations of the partially domesticated yeastSaccharomyces cerevisiaeand its wild relativeSaccharomyces paradoxus. Complete genome assemblies and annotations generate population-level reference genomes and allow for the first explicit definition of chromosome partitioning into cores, subtelomeres and chromosome-ends. High-resolution view of structural dynamics uncovers that, in chromosomal cores,S. paradoxusexhibits higher accumulation rate of balanced structural rearrangements (inversions, translocations and transpositions) whereasS. cerevisiaeaccumulates unbalanced rearrangements (large insertions, deletions and duplications) more rapidly. In subtelomeres, recurrent interchromosomal reshuffling was found in both species, with higher rate inS. cerevisiae. Such striking contrasts between wild and domesticated yeasts reveal the influence of human activities on structural genome evolution.


Sign in / Sign up

Export Citation Format

Share Document