Evaluating de novo assembly and binning strategies for time-series drinking water metagenomes.

Mapping Intimacies ◽

10.1101/2021.07.11.451960 ◽

2021 ◽

Author(s):

Solize Vosloo ◽

Linxuan Huo ◽

Christopher L Anderson ◽

Maria Sevillano Rivera ◽

Zihan Dai ◽

...

Keyword(s):

Time Series ◽

Drinking Water ◽

Microbial Communities ◽

De Novo ◽

Sequence Data ◽

Data Mapping ◽

High Quality ◽

Systematic Assessment ◽

Medium Quality ◽

Crucial Part

Reconstructing microbial genomes from metagenomic short-read data can be challenging due to the unknown and uneven complexity of microbial communities. This complexity encompasses highly diverse populations which often includes strain variants. Reconstructing high-quality genomes is a crucial part of the metagenomic workflow as subsequent ecological and metabolic inferences depend on their accuracy, quality, and completeness. In contrast to microbial communities in other ecosystems, there has been no systematic assessment of genome-centric metagenomic workflows for drinking water microbiomes. In this study, we assessed the performance of a combination of assembly and binning strategies for time-series drinking water metagenomes that were collected over a period of 6 months. The goal of this study was to identify the combination of assembly and binning approaches that results in high quality and quantity metagenome-assembled genomes (MAGs), representing most of the sequenced metagenome. Our findings suggest that the metaSPAdes co-assembly strategies had the best performance as they resulted in larger and less fragmented assemblies with at least 85% of the sequence data mapping to contigs greater than 1kbp. Furthermore, a combination of metaSPAdes co-assembly strategies and MetaBAT2 produced the highest number of medium-quality MAGs while capturing at least 70% of the metagenomes based on read recruitment. Utilizing different assembly/binning approaches also assist in the reconstruction of unique MAGs from closely related species that would have otherwise collapsed into a single MAG using a single workflow. Overall, our study suggests that leveraging multiple binning approaches with different metaSPAdes co-assembly strategies may be required to maximize the recovery of good-quality MAGs, which more accurately capture the microbial diversity of drinking water samples.

Download Full-text

MetaCoAG: Binning Metagenomic Contigs via Composition, Coverage and Assembly Graphs

10.1101/2021.09.10.459728 ◽

2021 ◽

Author(s):

Vijini Mallawaarachchi ◽

Yu Lin

Keyword(s):

Microbial Communities ◽

De Novo ◽

State Of The Art ◽

Genetic Material ◽

Single Copy ◽

Experimental Results ◽

Marker Genes ◽

High Quality ◽

Second Best ◽

Direct Use

ABSTRACTMetagenomics binning has allowed us to study and characterize various genetic material of different species and gain insights into microbial communities. While existing binning tools bin metagenomics de novo assemblies, they do not make use of the assembly graphs that produce such assemblies. Here we propose MetaCoAG, a tool that utilizes assembly graphs with the composition and coverage information to bin metagenomic contigs. MetaCoAG uses single-copy marker genes to estimate the number of initial bins, assigns contigs into bins iteratively and adjusts the number of bins dynamically throughout the binning process. Experimental results on simulated and real datasets demonstrate that MetaCoAG significantly outperforms state-of-the-art binning tools, producing more high-quality bins than the second-best tool, with an average median F1-score of 88.40%. To the best of our knowledge, MetaCoAG is the first stand-alone binning tool to make direct use of the assembly graph information. MetaCoAG is available at https://github.com/Vini2/MetaCoAG.

Download Full-text

Evaluating de Novo Assembly and Binning Strategies for Time Series Drinking Water Metagenomes

Microbiology Spectrum ◽

10.1128/spectrum.01434-21 ◽

2021 ◽

Author(s):

Solize Vosloo ◽

Linxuan Huo ◽

Christopher L. Anderson ◽

Zihan Dai ◽

Maria Sevillano ◽

...

Keyword(s):

Public Health ◽

Time Series ◽

Drinking Water ◽

De Novo Assembly ◽

De Novo ◽

Water Infrastructure ◽

Quality Of Water ◽

Diverse Groups

Drinking water contains phylogenetic diverse groups of bacteria, archaea, and eukarya that affect the esthetic quality of water, water infrastructure, and public health. Taxonomic, metabolic, and ecological inferences of the drinking water microbiome depend on the accuracy, quality, and completeness of genomes that are reconstructed through the application of genome-resolved metagenomics.

Download Full-text

Complete Genome Sequence of the Multidrug-Resistant Pseudomonas aeruginosa Endemic Houston-1 Strain, Isolated from a Pediatric Patient with Cystic Fibrosis and Assembled Using Oxford Nanopore and Illumina Sequencing

Microbiology Resource Announcements ◽

10.1128/mra.00903-19 ◽

2019 ◽

Vol 8 (43) ◽

Cited By ~ 1

Author(s):

Jennifer K. Spinler ◽

Sabeen Raza ◽

Jessica K. Runge ◽

Ruth Ann Luna

Keyword(s):

Cystic Fibrosis ◽

Pseudomonas Aeruginosa ◽

De Novo ◽

Sequence Data ◽

Care Center ◽

Multidrug Resistant ◽

High Quality ◽

Content Type ◽

Representative Sequence ◽

Oxford Nanopore

Hybrid de novo assembly of Illumina/Nanopore sequence data produced complete circular sequences of the chromosome and a plasmid for the multidrug-resistant Pseudomonas aeruginosa Houston-1 strain. This provides a high-quality representative sequence for a lineage endemic to a pediatric cystic fibrosis care center at Texas Children’s Hospital.

Download Full-text

Complete Genome Sequence of Clostridioides difficile Ribotype 255 Strain Mta-79, Assembled Using Oxford Nanopore and Illumina Sequencing

Microbiology Resource Announcements ◽

10.1128/mra.00935-19 ◽

2019 ◽

Vol 8 (42) ◽

Cited By ~ 1

Author(s):

Jennifer K. Spinler ◽

Anne J. Gonzales-Luna ◽

Sabeen Raza ◽

Jessica K. Runge ◽

Ruth Ann Luna ◽

...

Keyword(s):

Elderly Patient ◽

Complete Genome Sequence ◽

Complete Genome ◽

De Novo ◽

Sequence Data ◽

High Quality ◽

Content Type ◽

Representative Sequence ◽

Clostridioides Difficile ◽

Oxford Nanopore

Hybrid de novo assembly of Illumina/Nanopore sequence data produced a complete circular sequence of the chromosome for a Clostridioides difficile ribotype 255 (RT255) isolate from an elderly patient with recurrent C. difficile infection (CDI). This provides a high-quality representative sequence for the RT255 lineage.

Download Full-text

ANALYSIS OF ABDOMINAL TYPHOID INCIDENCE IN THE SOUTH REGION OF THE KYRGYZ REPUBLIC

Natural resources of the Earth and environmental protection ◽

10.26787/nydha-2713-203x-2020-1-10-11-12-43-48 ◽

2020 ◽

pp. 43-48

Author(s):

Zakirova J.S. ◽

Nadirbekova R.A. ◽

Zholdoshev S.T.

Keyword(s):

Drinking Water ◽

Typhoid Fever ◽

Risk Groups ◽

The Body ◽

Kyrgyz Republic ◽

Radiation Factor ◽

High Quality ◽

Two Generations ◽

Immunological Deficiency

The article analyze the long-term morbidity, spread of typhoid fever in the southern regions of the Kyrgyz republic, and remains a permanent epidemic focus in the Jalal-Abad region, where against the low availability of the population to high-quality drinking water, an additional factor on the body for more than two generations and radiation factor, which we confirmed by the spread among the inhabitants of Mailuu-Suu of nosological forms of the syndrome of immunological deficiency, as a predictor of risk groups for infectious diseases, including typhoid fever.

Download Full-text

Development of a time-series shotgun metagenomics database for monitoring microbial communities at the Pacific coast of Japan

Scientific Reports ◽

10.1038/s41598-021-91615-3 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Kazutoshi Yoshitake ◽

Gaku Kimura ◽

Tomoko Sakami ◽

Tsuyoshi Watanabe ◽

Yukiko Taniuchi ◽

...

Keyword(s):

Time Series ◽

Microbial Communities ◽

Pacific Coast ◽

Three Dimensional ◽

Amplicon Sequencing ◽

Metagenomic Data ◽

Shotgun Metagenomics ◽

Functional Features ◽

Pacific Coast Of Japan ◽

Marine Microbial Communities

AbstractAlthough numerous metagenome, amplicon sequencing-based studies have been conducted to date to characterize marine microbial communities, relatively few have employed full metagenome shotgun sequencing to obtain a broader picture of the functional features of these marine microbial communities. Moreover, most of these studies only performed sporadic sampling, which is insufficient to understand an ecosystem comprehensively. In this study, we regularly conducted seawater sampling along the northeastern Pacific coast of Japan between March 2012 and May 2016. We collected 213 seawater samples and prepared size-based fractions to generate 454 subsets of samples for shotgun metagenome sequencing and analysis. We also determined the sequences of 16S rRNA (n = 111) and 18S rRNA (n = 47) gene amplicons from smaller sample subsets. We thereafter developed the Ocean Monitoring Database for time-series metagenomic data (http://marine-meta.healthscience.sci.waseda.ac.jp/omd/), which provides a three-dimensional bird’s-eye view of the data. This database includes results of digital DNA chip analysis, a novel method for estimating ocean characteristics such as water temperature from metagenomic data. Furthermore, we developed a novel classification method that includes more information about viruses than that acquired using BLAST. We further report the discovery of a large number of previously overlooked (TAG)n repeat sequences in the genomes of marine microbes. We predict that the availability of this time-series database will lead to major discoveries in marine microbiome research.

Download Full-text

STRONG: metagenomics strain resolution on assembly graphs

Genome Biology ◽

10.1186/s13059-021-02419-7 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Christopher Quince ◽

Sergey Nurk ◽

Sebastien Raguideau ◽

Robert James ◽

Orkun S. Soyer ◽

...

Keyword(s):

Time Series ◽

De Novo ◽

Single Copy ◽

Bayesian Algorithm ◽

Anaerobic Digestor ◽

Core Genes

AbstractWe introduce STrain Resolution ON assembly Graphs (STRONG), which identifies strains de novo, from multiple metagenome samples. STRONG performs coassembly, and binning into metagenome assembled genomes (MAGs), and stores the coassembly graph prior to variant simplification. This enables the subgraphs and their unitig per-sample coverages, for individual single-copy core genes (SCGs) in each MAG, to be extracted. A Bayesian algorithm, BayesPaths, determines the number of strains present, their haplotypes or sequences on the SCGs, and abundances. STRONG is validated using synthetic communities and for a real anaerobic digestor time series generates haplotypes that match those observed from long Nanopore reads.

Download Full-text

Discovery of novel community-relevant small proteins in a simplified human intestinal microbiome

Microbiome ◽

10.1186/s40168-020-00981-z ◽

2021 ◽

Vol 9 (1) ◽

Author(s):

Hannes Petruschke ◽

Christian Schori ◽

Sebastian Canzler ◽

Sarah Riesbeck ◽

Anja Poehlein ◽

...

Keyword(s):

Microbial Communities ◽

Intestinal Microbiota ◽

De Novo ◽

Bacterial Species ◽

Intestinal Microbiome ◽

Single Strain ◽

Small Proteins ◽

Human Intestinal Microbiota ◽

Wide Range

Abstract Background The intestinal microbiota plays a crucial role in protecting the host from pathogenic microbes, modulating immunity and regulating metabolic processes. We studied the simplified human intestinal microbiota (SIHUMIx) consisting of eight bacterial species with a particular focus on the discovery of novel small proteins with less than 100 amino acids (= sProteins), some of which may contribute to shape the simplified human intestinal microbiota. Although sProteins carry out a wide range of important functions, they are still often missed in genome annotations, and little is known about their structure and function in individual microbes and especially in microbial communities. Results We created a multi-species integrated proteogenomics search database (iPtgxDB) to enable a comprehensive identification of novel sProteins. Six of the eight SIHUMIx species, for which no complete genomes were available, were sequenced and de novo assembled. Several proteomics approaches including two earlier optimized sProtein enrichment strategies were applied to specifically increase the chances for novel sProtein discovery. The search of tandem mass spectrometry (MS/MS) data against the multi-species iPtgxDB enabled the identification of 31 novel sProteins, of which the expression of 30 was supported by metatranscriptomics data. Using synthetic peptides, we were able to validate the expression of 25 novel sProteins. The comparison of sProtein expression in each single strain versus a multi-species community cultivation showed that six of these sProteins were only identified in the SIHUMIx community indicating a potentially important role of sProteins in the organization of microbial communities. Two of these novel sProteins have a potential antimicrobial function. Metabolic modelling revealed that a third sProtein is located in a genomic region encoding several enzymes relevant for the community metabolism within SIHUMIx. Conclusions We outline an integrated experimental and bioinformatics workflow for the discovery of novel sProteins in a simplified intestinal model system that can be generically applied to other microbial communities. The further analysis of novel sProteins uniquely expressed in the SIHUMIx multi-species community is expected to enable new insights into the role of sProteins on the functionality of bacterial communities such as those of the human intestinal tract.

Download Full-text

A long reads-based de-novo assembly of the genome of the Arlee homozygous line reveals chromosomal rearrangements in rainbow trout

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab052 ◽

2021 ◽

Author(s):

Guangtu Gao ◽

Susana Magadan ◽

Geoffrey C Waldbieser ◽

Ramey C Youngblood ◽

Paul A Wheeler ◽

...

Keyword(s):

Rainbow Trout ◽

Chromosome Number ◽

Genome Assembly ◽

De Novo Assembly ◽

De Novo ◽

Sequence Data ◽

Structural Variations ◽

High Coverage ◽

Haploid Chromosome Number ◽

Long Reads

Abstract Currently, there is still a need to improve the contiguity of the rainbow trout reference genome and to use multiple genetic backgrounds that will represent the genetic diversity of this species. The Arlee doubled haploid line was originated from a domesticated hatchery strain that was originally collected from the northern California coast. The Canu pipeline was used to generate the Arlee line genome de-novo assembly from high coverage PacBio long-reads sequence data. The assembly was further improved with Bionano optical maps and Hi-C proximity ligation sequence data to generate 32 major scaffolds corresponding to the karyotype of the Arlee line (2 N = 64). It is composed of 938 scaffolds with N50 of 39.16 Mb and a total length of 2.33 Gb, of which ∼95% was in 32 chromosome sequences with only 438 gaps between contigs and scaffolds. In rainbow trout the haploid chromosome number can vary from 29 to 32. In the Arlee karyotype the haploid chromosome number is 32 because chromosomes Omy04, 14 and 25 are divided into six acrocentric chromosomes. Additional structural variations that were identified in the Arlee genome included the major inversions on chromosomes Omy05 and Omy20 and additional 15 smaller inversions that will require further validation. This is also the first rainbow trout genome assembly that includes a scaffold with the sex-determination gene (sdY) in the chromosome Y sequence. The utility of this genome assembly is demonstrated through the improved annotation of the duplicated genome loci that harbor the IGH genes on chromosomes Omy12 and Omy13.

Download Full-text

De Novo SNP Discovery and Genotyping of Iranian Pimpinella Species Using ddRAD Sequencing

Agronomy ◽

10.3390/agronomy11071342 ◽

2021 ◽

Vol 11 (7) ◽

pp. 1342

Author(s):

Shaghayegh Mehravi ◽

Gholam Ali Ranjbar ◽

Ghader Mirzaghaderi ◽

Anita Alice Severn-Ellis ◽

Armin Scheben ◽

...

Keyword(s):

De Novo ◽

Genetic Relationships ◽

Nucleotide Polymorphisms ◽

High Quality ◽

Genomic Resources ◽

High Quality Snps ◽

The Family ◽

Double Digestion ◽

Flanking Sequences ◽

Downstream Analysis

The species of Pimpinella, one of the largest genera of the family Apiaceae, are traditionally cultivated for medicinal purposes. In this study, high-throughput double digest restriction-site associated DNA sequencing technology (ddRAD-seq) was used to identify single nucleotide polymorphisms (SNPs) in eight Pimpinella species from Iran. After double-digestion with the enzymes HpyCH4IV and HinfI, a total of 334,702,966 paired-end reads were de novo assembled into 1,270,791 loci with an average of 28.8 reads per locus. After stringent filtering, 2440 high-quality SNPs were identified for downstream analysis. Analysis of genetic relationships and population structure, based on these retained SNPs, indicated the presence of three major groups. Gene ontology and pathway analysis were determined by using comparison SNP-associated flanking sequences with a public non-redundant database. Due to the lack of genomic resources in this genus, our present study is the first report to provide high-quality SNPs in Pimpinella based on a de novo analysis pipeline using ddRAD-seq. This data will enhance the molecular knowledge of the genus Pimpinella and will provide an important source of information for breeders and the research community to enhance breeding programs and support the management of Pimpinella genomic resources.

Download Full-text