Harnessing the strategy of metagenomics for exploring the intestinal microecology of sable (Martes zibellina), the national first-level protected animal

Abstract Sable (Martes zibellina), a member of family Mustelidae, order Carnivora, is primarily distributed in the cold northern zone of Eurasia. The purpose of this study was to explore the intestinal flora of the sable by metagenomic library-based techniques. Libraries were sequenced on an Illumina HiSeq 4000 instrument. The effective sequencing data of each sample was above 6,000 M, and the ratio of clean reads to raw reads was over 98%. The total ORF length was approximately 603,031, equivalent to 347.36 Mbp. We investigated gene functions with the KEGG database and identified 7,140 KEGG ortholog (KO) groups comprising 129,788 genes across all of the samples. We selected a subset of genes with the highest abundances to construct cluster heat maps. From the results of the KEGG metabolic pathway annotations, we acquired information on gene functions, as represented by the categories of metabolism, environmental information processing, genetic information processing, cellular processes and organismal systems. We then investigated gene function with the CAZy database and identified functional carbohydrate hydrolases corresponding to genes in the intestinal microorganisms of sable. This finding is consistent with the fact that the sable is adapted to cold environments and requires a large amount of energy to maintain its metabolic activity. We also investigated gene functions with the eggNOG database; the main functions of genes included gene duplication, recombination and repair, transport and metabolism of amino acids, and transport and metabolism of carbohydrates. In this study, we attempted to identify the complex structure of the microbial population of sable based on metagenomic sequencing methods, which use whole metagenomic data, and to map the obtained sequences to known genes or pathways in existing databases, such as CAZy, KEGG, and eggNOG. We then explored the genetic composition and functional diversity of the microbial community based on the mapped functional categories.

Download Full-text

Harnessing the strategy of metagenomics for exploring the intestinal microecology of sable (Martes zibellina), the national first-level protected animal

AMB Express ◽

10.1186/s13568-020-01103-6 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Jiakuo Yan ◽

Xiaoyang Wu ◽

Jun Chen ◽

Yao Chen ◽

Honghai Zhang

Keyword(s):

Information Processing ◽

Complex Structure ◽

Intestinal Flora ◽

Metagenomic Library ◽

Metagenomic Data ◽

Metagenomic Sequencing ◽

Sequencing Data ◽

Illumina Hiseq ◽

Martes Zibellina ◽

Gene Functions

Abstract Sable (Martes zibellina), a member of family Mustelidae, order Carnivora, is primarily distributed in the cold northern zone of Eurasia. The purpose of this study was to explore the intestinal flora of the sable by metagenomic library-based techniques. Libraries were sequenced on an Illumina HiSeq 4000 instrument. The effective sequencing data of each sample was above 6000 M, and the ratio of clean reads to raw reads was over 98%. The total ORF length was approximately 603,031, equivalent to 347.36 Mbp. We investigated gene functions with the KEGG database and identified 7140 KEGG ortholog (KO) groups comprising 129,788 genes across all of the samples. We selected a subset of genes with the highest abundances to construct cluster heat maps. From the results of the KEGG metabolic pathway annotations, we acquired information on gene functions, as represented by the categories of metabolism, environmental information processing, genetic information processing, cellular processes and organismal systems. We then investigated gene function with the CAZy database and identified functional carbohydrate hydrolases corresponding to genes in the intestinal microorganisms of sable. This finding is consistent with the fact that the sable is adapted to cold environments and requires a large amount of energy to maintain its metabolic activity. We also investigated gene functions with the eggNOG database; the main functions of genes included gene duplication, recombination and repair, transport and metabolism of amino acids, and transport and metabolism of carbohydrates. In this study, we attempted to identify the complex structure of the microbial population of sable based on metagenomic sequencing methods, which use whole metagenomic data, and to map the obtained sequences to known genes or pathways in existing databases, such as CAZy, KEGG, and eggNOG. We then explored the genetic composition and functional diversity of the microbial community based on the mapped functional categories.

Download Full-text

Harnessing the strategy of metagenomics for exploring the intestinal microecology of sable (Martes zibellina), the national first-level protected animal

10.21203/rs.3.rs-28506/v2 ◽

2020 ◽

Author(s):

Jiakuo Yan ◽

Xiaoyang Wu ◽

Jun Chen ◽

Yao Chen ◽

Honghai Zhang

Keyword(s):

Information Processing ◽

Intestinal Flora ◽

Metagenomic Library ◽

Original Data ◽

Unique Function ◽

Metagenomic Data ◽

Data Mapping ◽

Metagenomic Sequencing ◽

Illumina Hiseq ◽

Martes Zibellina

Abstract Sable (Martes zibellina), belongs to Carnivora, Mustelidae and Maretes, was mainly distributed among the cold northern zone of Eurasia. The purpose of this study is to explore the intestinal flora of the sable by the method of the metagenomic library-based technique, libraries were sequenced on an Illumina HiSeq 4000 instrument. Effective Data volume of each sample is above 6000M, the ratio of the Effective Data (the Clean Data) to original Data (Raw Data) is over 98%. According to the analysis of statistical data, the Total length of ORF is about 603,031, which is 347.36 Mbp. We contrast the unique function of genes with KEGG database, we acquire 7140 genes (KO), a total of all the samples KO is 129788. We selected higher abundance genes to draw cluster heat maps, and according to the results of the KEGG metabolic pathway annotations, we acquire the gene function，including metabolism, environmental information processing, genetic information processing, cellular process and organismal systems. We contrast the unique function of genes with CAZy database, the functional carbohydrate hydrolases have corresponding genes in the intestinal microorganisms of the sable. This is closely related to the fact that the sable is adapted to cold environments and requires a large amount of energy to maintain its metabolic activity. We contrast the unique function of genes with eggNOG database，the main functions of genes included gene duplication, recombination and repair, transport and metabolism of amino acids, transport and metabolism of carbohydrates, etc. In this study, we intended to identify the complex microbial population structure of sables based on metagenomic sequencing method, which uses the whole metagenomic data, mapping the sequences to the known genes or the pathways in the existing databases, such as CAZy, KEGG, or eggNOG, and then exploring the genetic composition and functional diversity of microbial community based on the mapped functional categories.

Download Full-text

Harnessing the Strategy of Metagenomics for Exploring the Intestinal Microecology of Sable (Martes Zibellina), the National First-Level Protected Animal

10.21203/rs.3.rs-28506/v1 ◽

2020 ◽

Author(s):

Jiakuo Yan ◽

Xiaoyang Wu ◽

Jun Chen ◽

Yao Chen ◽

Honghai Zhang

Keyword(s):

Information Processing ◽

Intestinal Flora ◽

Metagenomic Library ◽

Original Data ◽

Unique Function ◽

Illumina Hiseq ◽

Martes Zibellina ◽

Cold Environments ◽

Data Volume ◽

Genetic Information Processing

Download Full-text

Evaluation of the CosmosID Bioinformatics Platform for Prosthetic Joint-Associated Sonicate Fluid Shotgun Metagenomic Data Analysis

Journal of Clinical Microbiology ◽

10.1128/jcm.01182-18 ◽

2018 ◽

Vol 57 (2) ◽

Cited By ~ 8

Author(s):

Qun Yan ◽

Yu Mi Wi ◽

Matthew J. Thoendel ◽

Yash S. Raval ◽

Kerryl E. Greenwood-Quaintance ◽

...

Keyword(s):

Antibiotic Resistance ◽

Metagenomic Data ◽

Metagenomic Sequencing ◽

Antibacterial Resistance ◽

Sequencing Data ◽

Bacterial Detection ◽

Shotgun Metagenomic Sequencing ◽

Prosthetic Joint ◽

Validation Set ◽

Fluid Culture

ABSTRACT We previously demonstrated that shotgun metagenomic sequencing can detect bacteria in sonicate fluid, providing a diagnosis of prosthetic joint infection (PJI). A limitation of the approach that we used is that data analysis was time-consuming and specialized bioinformatics expertise was required, both of which are barriers to routine clinical use. Fortunately, automated commercial analytic platforms that can interpret shotgun metagenomic data are emerging. In this study, we evaluated the CosmosID bioinformatics platform using shotgun metagenomic sequencing data derived from 408 sonicate fluid samples from our prior study with the goal of evaluating the platform vis-à-vis bacterial detection and antibiotic resistance gene detection for predicting staphylococcal antibacterial susceptibility. Samples were divided into a derivation set and a validation set, each consisting of 204 samples; results from the derivation set were used to establish cutoffs, which were then tested in the validation set for identifying pathogens and predicting staphylococcal antibacterial resistance. Metagenomic analysis detected bacteria in 94.8% (109/115) of sonicate fluid culture-positive PJIs and 37.8% (37/98) of sonicate fluid culture-negative PJIs. Metagenomic analysis showed sensitivities ranging from 65.7 to 85.0% for predicting staphylococcal antibacterial resistance. In conclusion, the CosmosID platform has the potential to provide fast, reliable bacterial detection and identification from metagenomic shotgun sequencing data derived from sonicate fluid for the diagnosis of PJI. Strategies for metagenomic detection of antibiotic resistance genes for predicting staphylococcal antibacterial resistance need further development.

Download Full-text

Towards end-to-end disease prediction from raw metagenomic data

10.1101/2020.10.29.360297 ◽

2020 ◽

Author(s):

Maxence Queyrel ◽

Edi Prifti ◽

Jean-Daniel Zucker

Keyword(s):

Dna Sequences ◽

Real Life ◽

Multiple Instance Learning ◽

Disease Classification ◽

Metagenomic Data ◽

Numerical Representation ◽

Metagenomic Sequencing ◽

Sequencing Data ◽

End To End ◽

Bioinformatics Workflows

AbstractAnalysis of the human microbiome using metagenomic sequencing data has demonstrated high ability in discriminating various human diseases. Raw metagenomic sequencing data require multiple complex and computationally heavy bioinformatics steps prior to data analysis. Such data contain millions of short sequences read from the fragmented DNA sequences and are stored as fastq files. Conventional processing pipelines consist multiple steps including quality control, filtering, alignment of sequences against genomic catalogs (genes, species, taxonomic levels, functional pathways, etc.). These pipelines are complex to use, time consuming and rely on a large number of parameters that often provide variability and impact the estimation of the microbiome elements. Recent studies have demonstrated that training Deep Neural Networks directly from raw sequencing data is a promising approach to bypass some of the challenges associated with mainstream bioinformatics pipelines. Most of these methods use the concept of word and sentence embeddings that create a meaningful and numerical representation of DNA sequences, while extracting features and reducing the dimentionality of the data. In this paper we present an end-to-end approach that classifies patients into disease groups directly from raw metagenomic reads: metagenome2vec. This approach is composed of four steps (i) generating a vocabulary of k-mers and learning their numerical embeddings; (ii) learning DNA sequence (read) embeddings; (iii) identifying the genome from which the sequence is most likely to come and (iv) training a multiple instance learning classifier which predicts the phenotype based on the vector representation of the raw data. An attention mechanism is applied in the network so that the model can be interpreted, assigning a weight to the influence of the prediction for each genome. Using two public real-life datasets as well a simulated one, we demonstrated that this original approach reached very high performances, comparable with the state-of-the-art methods applied directly on processed data though mainstream bioinformatics workflows. These results are encouraging for this proof of concept work. We believe that with further dedication, the DNN models have the potential to surpass mainstream bioinformatics workflows in disease classification tasks.

Download Full-text

BiomeSeq: A Tool for the Characterization of Animal Microbiomes from Metagenomic Data

10.21203/rs.3.rs-842545/v1 ◽

2021 ◽

Author(s):

Kelly A. Mulholland ◽

Calvin L. Keeler

Keyword(s):

Relative Abundance ◽

Performance Metrics ◽

Complete Characterization ◽

Metagenomic Data ◽

Metagenomic Sequencing ◽

Sequencing Data ◽

Microbial Composition ◽

Additional Species ◽

User Friendly

Abstract BackgroundThe complete characterization of a microbiome is critical in elucidating the complex ecology of the microbial composition within healthy and diseased animals. Many microbiome studies characterize only the bacterial component, for which there are several well-developed sequencing methods, bioinformatics tools and databases available. The lack of comprehensive bioinformatics workflows and databases have limited efforts to characterize the other components existing in a microbiome. BiomeSeq is a tool for the analysis of the complete animal microbiome using metagenomic sequencing data. With its comprehensive workflow and customizable parameters and microbial databases, BiomeSeq can rapidly quantify the viral, fungal, bacteriophage and bacterial components of a sample and produce informative tables for analysis. ResultsSimulated datasets were constructed, which contained known abundances of microbial sequences, and several performance metrics were analyzed, including correlation of predicted abundance with known abundance, root mean square error and rate of speed. BiomeSeq demonstrated high precision (average of 99.52%) and sensitivity (average of 93.01%). BiomeSeq was employed in detecting and quantifying the respiratory microbiome of a commercial poultry broiler flock throughout its grow-out cycle from hatching to processing and successfully processed 780 million reads. For each microbial species detected, BiomeSeq calculated the normalized abundance, percent relative abundance, and coverage as well as the diversity for each sample. Rate of speed for each step in the pipeline, precision and accuracy were calculated to examine BiomeSeq’s performance using in silico sequencing datasets. When compared to bacterial results generated by the commonly used 16S rRNA sequencing method, BiomeSeq detected the same most abundant bacteria, including Gallibacterium, Corynebacterium and Staphylococcus, as well as several additional species. ConclusionsBiomeSeq provides for the detection and quantification of the microbiome from next-generation metagenomic sequencing data. This tool is implemented into a user-friendly container that requires one command and generates a table containing taxonomical information for each microbe detected. It also determines normalized abundance, percent relative abundance, genome coverage and sample diversity calculations for each sample.

Download Full-text

Fast functional annotation of metagenomic shotgun data by DNA alignment to a microbial gene catalog

10.1101/120402 ◽

2017 ◽

Author(s):

Stuart M. Brown ◽

Yuhan Hao ◽

Hao Chen ◽

Bobby P. Laungani ◽

Thahmina A. Ali ◽

...

Keyword(s):

Functional Annotation ◽

Sequence Data ◽

Human Microbiome ◽

Metagenomic Data ◽

Metagenomic Sequencing ◽

Alternative Analysis ◽

Metagenomic Sequence ◽

Shotgun Metagenomics ◽

Gene Functions ◽

Dna Alignment

AbstractBackgroundMetagenomic shotgun sequencing is becoming increasingly popular to study microbes associated with the human body and in environmental samples. A key goal of shotgun metagenomic sequencing is to identify gene functions and metabolic pathways that differ between samples or conditions. However, current methods to identify function in the large number of reads in a high-throughput sequence data file rely on the computationally intensive and low stringency approach of mapping each read to a generic database of proteins or reference microbial genomes.ResultsWe have developed an alternative analysis approach for shotgun metagenomic sequence data utilizing Bowtie2 DNA-DNA alignment of the reads to a database of well annotated genes compiled from human microbiome data. This method is rapid, and provides high stringency matches (>90% DNA sequence identity) of shotgun metagenomics reads to genes with annotated functions. We demonstrate the use of this method with synthetic data, Human Microbiome Project shotgun metagenomic data sets, and data from a study of liver disease. Differentially abundant KEGG gene functions can be detected in these experiments.ConclusionsFunctional annotation of metagenomic shotgun sequence reads can be accomplished by rapid DNA-DNA matching to a custom database of microbial sequences using the Bowtie2 sequence alignment tool. This method can be used for a variety of microbiome studies and allows functional analysis which is otherwise computationally demanding. This rapid annotation method is freely available as a Galaxy workflow within a Docker image.

Download Full-text

Conserved bacterial genomes from two geographically distinct peritidal stromatolite formations shed light on potential functional guilds

10.1101/818625 ◽

2019 ◽

Author(s):

Samantha C. Waterworth ◽

Eric W. Isemonger ◽

Evan R. Rees ◽

Rosemary A. Dorrington ◽

Jason C. Kwan

Keyword(s):

Microbial Mats ◽

Bacterial Species ◽

Species Conservation ◽

Cumulative Effect ◽

Metagenomic Data ◽

Metagenomic Sequencing ◽

Sequencing Data ◽

Space Forms ◽

Nitrogenous Compounds ◽

Shark Bay

SUMMARYStromatolites are complex microbial mats that form lithified layers and ancient forms are the oldest evidence of life on earth, dating back over 3.4 billion years. Modern stromatolites are relatively rare but may provide clues about the function and evolution of their ancient counterparts. In this study, we focus on peritidal stromatolites occurring at Cape Recife and Schoenmakerskop on the southeastern South African coastline. Using assembled shotgun metagenomic data we obtained 183 genomic bins, of which the most dominant taxa were from the Cyanobacteriia class (Cyanobacteria phylum), with lower but notable abundances of bacteria classified as Alphaproteobacteria, Gammaproteobacteria and Bacteroidia. We identified functional gene sets in bacterial species conserved across two geographically distinct stromatolite formations, which may promote carbonate precipitation through the reduction of nitrogenous compounds and possible production of calcium ions. We propose that an abundance of extracellular alkaline phosphatases may lead to the formation of phosphatic deposits within these stromatolites. We conclude that the cumulative effect of several conserved bacterial species drives accretion in these two stromatolite formations.ORIGINALITY-SIGNIFICANCEPeritidal stromatolites are unique among stromatolite formations as they grow at the dynamic interface of calcium carbonate-rich groundwater and coastal marine waters. The peritidal space forms a relatively unstable environment and the factors that influence the growth of these peritidal structures is not well understood. To our knowledge, this is the first comparative study that assesses species conservation within the microbial communities of two geographically distinct peritidal stromatolite formations. We assessed the potential functional roles of these communities using genomic bins clustered from metagenomic sequencing data. We identified several conserved bacterial species across the two sites and hypothesize that their genetic functional potential may be important in the formation of pertidal stromatolites. We contrasted these findings against a well-studied site in Shark Bay, Australia and show that, unlike these hypersaline formations, archaea do not play a major role in peritidal stromatolite formation. Furthermore, bacterial nitrogen and phosphate metabolisms of conserved species may be driving factors behind lithification in peritidal stromatolites.

Download Full-text

Comprehensive discovery of CRISPR-targeted terminally redundant sequences in the human gut metagenome: Viruses, plasmids, and more

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009428 ◽

2021 ◽

Vol 17 (10) ◽

pp. e1009428

Author(s):

Ryota Sugimoto ◽

Luca Nishimura ◽

Phuong Thanh Nguyen ◽

Jumpei Ito ◽

Nicholas F. Parrish ◽

...

Keyword(s):

De Novo ◽

Sequence Similarity ◽

Metagenomic Data ◽

Marker Genes ◽

Biological Entity ◽

Metagenomic Sequencing ◽

Sequencing Data ◽

Human Gut ◽

Protein Coding ◽

Viral Sequences

Viruses are the most numerous biological entity, existing in all environments and infecting all cellular organisms. Compared with cellular life, the evolution and origin of viruses are poorly understood; viruses are enormously diverse, and most lack sequence similarity to cellular genes. To uncover viral sequences without relying on either reference viral sequences from databases or marker genes that characterize specific viral taxa, we developed an analysis pipeline for virus inference based on clustered regularly interspaced short palindromic repeats (CRISPR). CRISPR is a prokaryotic nucleic acid restriction system that stores the memory of previous exposure. Our protocol can infer CRISPR-targeted sequences, including viruses, plasmids, and previously uncharacterized elements, and predict their hosts using unassembled short-read metagenomic sequencing data. By analyzing human gut metagenomic data, we extracted 11,391 terminally redundant CRISPR-targeted sequences, which are likely complete circular genomes. The sequences included 2,154 tailed-phage genomes, together with 257 complete crAssphage genomes, 11 genomes larger than 200 kilobases, 766 genomes of Microviridae species, 56 genomes of Inoviridae species, and 95 previously uncharacterized circular small genomes that have no reliably predicted protein-coding gene. We predicted the host(s) of approximately 70% of the discovered genomes at the taxonomic level of phylum by linking protospacers to taxonomically assigned CRISPR direct repeats. These results demonstrate that our protocol is efficient for de novo inference of CRISPR-targeted sequences and their host prediction.

Download Full-text

Characteristics of Intestinal Flora in Pregnant Women with Mild Thalassemia Revealed by Metagenomics

Jundishapur Journal of Microbiology ◽

10.5812/jjm.119925 ◽

2021 ◽

Vol 14 (11) ◽

Author(s):

Yong-Zhi Lun ◽

Wei Qiu ◽

Wenqi Zhao ◽

Hua Lin ◽

Mintao Zhong ◽

...

Keyword(s):

Pregnant Women ◽

Pathogenic Bacteria ◽

Sequence Data ◽

Intestinal Flora ◽

Control Group ◽

Metagenomic Sequencing ◽

Sequencing Technology ◽

Illumina Hiseq ◽

Multiple Comparison Test ◽

Wilcoxon Rank Sum Test

Background: At present, there is no report that the intestinal flora of pregnant women with mild thalassemia is different from that of healthy pregnant women. Objectives: This study compared the composition and changes of the intestinal flora of pregnant women with mild thalassemia to those of healthy pregnant women using metagenomic sequencing technology and evaluated the potential microecological risk for pregnant women and the fetus. Methods: The present study was carried out on 14 mild thalassemia pregnant women with similar backgrounds in the Affiliated Hospital of Putian University, Fujian, China. In the same period, 6 healthy pregnant women were selected as the control group. The genomic deoxyribonucleic acid was extracted from the sable stool samples of pregnant women. Illumina HiSeq sequencing technology was adopted after library preparation. Prodigal software (ver 2.6.3, Salmon software (ver 1.6.0, and Kraken software (ver 2) were used to analyze the sequence data. Moreover, analysis of variance and Duncan’s multiple-comparison test or Wilcoxon rank-sum test were used as statistical methods. Results: The characteristics of the intestinal flora of pregnant women with mild thalassemia differed significantly from those of healthy pregnant women, showing an increase in some conditionally pathogenic bacteria (e.g., Prevotella stercorea rose and Escherichia coli) and a decrease in some probiotic bacteria, which might affect pregnant women and cause physiological function damage to their offspring by changing metabolic pathways; however, further validation is needed. Conclusions: The diversity and composition of intestinal flora in pregnant women with mild thalassemia vary significantly from those in healthy pregnant women, especially at the genus and species levels, representing more profound alterations in intestinal microecology.

Download Full-text