scholarly journals Adapting Macroecology to Microbiology: Using Occupancy Modeling To Assess Functional Profiles across Metagenomes

mSystems ◽  
2021 ◽  
Author(s):  
Angus S. Hilts ◽  
Manjot S. Hunjan ◽  
Laura A. Hug

Metagenomics is maturing rapidly as a field but is hampered by a lack of available statistical tools. A primary area of uncertainty is around missing genes or functions from a metagenomic data set.

2014 ◽  
Vol 104 (10) ◽  
pp. 1125-1129 ◽  
Author(s):  
A. H. Stobbe ◽  
W. L. Schneider ◽  
P. R. Hoyt ◽  
U. Melcher

Next generation sequencing (NGS) is not used commonly in diagnostics, in part due to the large amount of time and computational power needed to identify the taxonomic origin of each sequence in a NGS data set. By using the unassembled NGS data sets as the target for searches, pathogen-specific sequences, termed e-probes, could be used as queries to enable detection of specific viruses or organisms in plant sample metagenomes. This method, designated e-probe diagnostic nucleic acid assay, first tested with mock sequence databases, was tested with NGS data sets generated from plants infected with a DNA (Bean golden yellow mosaic virus, BGYMV) or an RNA (Plum pox virus, PPV) virus. In addition, the ability to detect and differentiate among strains of a single virus species, PPV, was examined by using probe sets that were specific to strains. The use of probe sets for multiple viruses determined that one sample was dually infected with BGYMV and Bean golden mosaic virus.


mSystems ◽  
2021 ◽  
Vol 6 (3) ◽  
Author(s):  
Christian Milani ◽  
Gabriele Andrea Lugli ◽  
Federico Fontana ◽  
Leonardo Mancabelli ◽  
Giulia Alessandri ◽  
...  

We developed a novel tool, i.e., METAnnotatorX2, that includes a number of new advanced features for analysis of deep and shallow metagenomic data sets and is accompanied by (regularly updated) customized databases for archaea, bacteria, fungi, protists, and viruses. Both software and databases were developed so as to maximize sensitivity and specificity while including support for shallow metagenomic data sets.


2020 ◽  
Author(s):  
Syed Shujaat Ali Zaidi ◽  
Masood Ur Rehman Kayani ◽  
Xuegong Zhang ◽  
Imran Haider Shamsi

Abstract Background: Efficient regulation of bacterial genes against the environmental stimulus results in unique operonic organizations. Lack of complete reference and functional information makes metagenomic operon prediction challenging and therefore opens new perspectives on the interpretation of the host-microbe interactions. Methods: Here we present MetaRon (pipeline for the prediction of Metagenomic operons), an open-source pipeline explicitly designed for the metagenomic shotgun sequencing data. It recreates the operonic structure without functional information. MetaRon identifies closely packed co-directional gene clusters with a promoter upstream and downstream of the first and last gene, respectively. Promoter prediction marks the transcriptional unit boundary (TUB) of closely packed co-directional gene clusters.Results: Escherichia coli (E. coli) K-12 MG1655 presents a gold standard for operon prediction. Therefore, MetaRon was initially implemented on two simulated illumina datasets: (1) E. coli MG1655 genome (2) a mixture of E. coli MG1655, Mycobacterium tuberculosis H37Rv and Bacillus subtilis str. 168 genomes. Operons were predicted in the single genome and mixture of genomes with a sensitivity of 97.8% and 93.7%, respectively. In the next phase, operons predicted from E. coli c20 draft genome isolated from chicken gut metagenome achieved a sensitivity of 94.1%. Lastly, the application of MetaRon on 145 paired-end gut metagenome samples identified 1,232,407 unique operons. Conclusion: MetaRon removes two notable limitations of existing methods: (1) dependency on functional information, and (2) liberates the users from enormous metagenomic data management. Current study showed the idea of using operons as subset to represent the whole-metagenome in terms of secondary metabolites and demonstrated its effectiveness in explaining the occurrence of a disease condition. This will significantly reduce the hefty whole-metagenome data to a small more precise data set. Furthermore, metabolic pathways from the operonic sequences were identified in association with the occurrence of type 2 diabetes (T2D). Presumably, this is the first organized effort to predict metagenomic operons and perform a detailed analysis in association with a disease, in this case T2D. The application of MetaRon to metagenome data at diverse scale will be beneficial to understand the gene regulation and therapeutic metagenomics.


2020 ◽  
Vol 94 (11) ◽  
Author(s):  
Shengzhong Xu ◽  
Liang Zhou ◽  
Xiaosha Liang ◽  
Yifan Zhou ◽  
Hao Chen ◽  
...  

ABSTRACT Virophages are small parasitic double-stranded DNA (dsDNA) viruses of giant dsDNA viruses infecting unicellular eukaryotes. Except for a few isolated virophages characterized by parasitization mechanisms, features of virophages discovered in metagenomic data sets remain largely unknown. Here, the complete genomes of seven virophages (26.6 to 31.5 kbp) and four large DNA viruses (190.4 to 392.5 kbp) that coexist in the freshwater lake Dishui Lake, Shanghai, China, have been identified based on environmental metagenomic investigation. Both genomic and phylogenetic analyses indicate that Dishui Lake virophages (DSLVs) are closely related to each other and to other lake virophages, and Dishui Lake large DNA viruses are affiliated with the micro-green alga-infecting Prasinovirus of the Phycodnaviridae (named Dishui Lake phycodnaviruses [DSLPVs]) and protist (protozoan and alga)-infecting Mimiviridae (named Dishui Lake large alga virus [DSLLAV]). The DSLVs possess more genes with closer homology to that of large alga viruses than to that of giant protozoan viruses. Furthermore, the DSLVs are strongly associated with large green alga viruses, including DSLPV4 and DSLLAV1, based on codon usage as well as oligonucleotide frequency and correlation analyses. Surprisingly, a nonhomologous CRISPR-Cas like system is found in DSLLAV1, which appears to protect DSLLAV1 from the parasitization of DSLV5 and DSLV8. These results suggest that novel cell-virus-virophage (CVv) tripartite infection systems of green algae, large green alga virus (Phycodnaviridae- and Mimiviridae-related), and virophage exist in Dishui Lake, which will contribute to further deep investigations of the evolutionary interaction of virophages and large alga viruses as well as of the essential roles that the CVv plays in the ecology of algae. IMPORTANCE Virophages are small parasitizing viruses of large/giant viruses. To our knowledge, the few isolated virophages all parasitize giant protozoan viruses (Mimiviridae) for propagation and form a tripartite infection system with hosts, here named the cell-virus-virophage (CVv) system. However, the CVv system remains largely unknown in environmental metagenomic data sets. In this study, we systematically investigated the metagenomic data set from the freshwater lake Dishui Lake, Shanghai, China. Consequently, four novel large alga viruses and seven virophages were discovered to coexist in Dishui Lake. Surprisingly, a novel CVv tripartite infection system comprising green algae, large green alga viruses (Phycodnaviridae- and Mimiviridae-related), and virophages was identified based on genetic link, genomic signature, and CRISPR system analyses. Meanwhile, a nonhomologous CRISPR-like system was found in Dishui Lake large alga viruses, which appears to protect the virus host from the infection of Dishui Lake virophages (DSLVs). These findings are critical to give insight into the potential significance of CVv in global evolution and ecology.


2020 ◽  
Vol 34 (6) ◽  
pp. 988-998 ◽  
Author(s):  
Joanna Sosnowska ◽  
Peter Kuppens ◽  
Filip De Fruyt ◽  
Joeri Hofmans

In this paper, we demonstrate how an integrative approach to personality—one that combines within–person and between–person differences—can be achieved by drawing on the principles of dynamic systems theory. The dynamic systems perspective has the potential to reconcile both the stable and dynamic aspect of personality, it allows including different levels of analysis (i.e. traits and states), and it can account for regulatory mechanisms, as well as dynamic interactions between the elements of the system, and changes over time. While all of these features are obviously appealing, implementing a dynamic systems approach to personality is challenging. It requires new conceptual models, specific longitudinal research designs, and complex data analytical methods. In response to these issues, the first part of our paper discusses the Personality Dynamics model, a model that integrates the dynamic systems principles in a relatively straightforward way. Second, we review associated methodological and statistical tools that allow empirically testing the PersDyn model. Finally, the model and associated methodological and statistical tools are illustrated using an experience sampling methodology data set measuring Big Five personality states in 59 participants ( N = 1916 repeated measurements). © 2020 The Authors. European Journal of Personality published by John Wiley & Sons Ltd on behalf of European Association of Personality Psychology


mSphere ◽  
2019 ◽  
Vol 4 (2) ◽  
Author(s):  
Marli Vlok ◽  
Andrew S. Lang ◽  
Curtis A. Suttle

ABSTRACTRNA viruses, particularly genetically diverse members of thePicornavirales, are widespread and abundant in the ocean. Gene surveys suggest that there are spatial and temporal patterns in the composition of RNA virus assemblages, but data on their diversity and genetic variability in different oceanographic settings are limited. Here, we show that specific RNA virus genomes have widespread geographic distributions and that the dominant genotypes are under purifying selection. Genomes from three previously unknown picorna-like viruses (BC-1, -2, and -3) assembled from a coastal site in British Columbia, Canada, as well as marine RNA viruses JP-A, JP-B, andHeterosigma akashiwoRNA virus exhibited different biogeographical patterns. Thus, biotic factors such as host specificity and viral life cycle, and not just abiotic processes such as dispersal, affect marine RNA virus distribution. Sequence differences relative to reference genomes imply that virus quasispecies are under purifying selection, with synonymous single-nucleotide variations dominating in genomes from geographically distinct regions resulting in conservation of amino acid sequences. Conversely, sequences from coastal South Africa that mapped to marine RNA virus JP-A exhibited more nonsynonymous mutations, probably representing amino acid changes that accumulated over a longer separation. This biogeographical analysis of marine RNA viruses demonstrates that purifying selection is occurring across oceanographic provinces. These data add to the spectrum of known marine RNA virus genomes, show the importance of dispersal and purifying selection for these viruses, and indicate that closely related RNA viruses are pathogens of eukaryotic microbes across oceans.IMPORTANCEVery little is known about aquatic RNA virus populations and genome evolution. This is the first study that analyzes marine environmental RNA viral assemblages in an evolutionary and broad geographical context. This study contributes the largest marine RNA virus metagenomic data set to date, substantially increasing the sequencing space for RNA viruses and also providing a baseline for comparisons of marine RNA virus diversity. The new viruses discovered in this study are representative of the most abundant family of marine RNA viruses, theMarnaviridae, and expand our view of the diversity of this important group. Overall, our data and analyses provide a foundation for interpreting marine RNA virus diversity and evolution.


2019 ◽  
Vol 8 (47) ◽  
Author(s):  
Jamie Bojko ◽  
Krista A. McCoy ◽  
Donald C. Behringer ◽  
April M. H. Blakeslee

A single-stranded DNA (ssDNA) virus is presented from a metagenomic data set derived from Alphaproteobacteria-infected hepatopancreatic tissues of the crab Eurypanopeus depressus. The circular virus genome (4,768 bp) encodes 14 hypothetical proteins, some similar to other bacteriophages (Microviridae). Based on its relatedness to other Microviridae, this virus represents a member of a novel genus.


2016 ◽  
Author(s):  
Samuel M. Nicholls ◽  
Wayne Aubrey ◽  
Kurt de Grave ◽  
Leander Schietgat ◽  
Christopher J. Creevey ◽  
...  

AbstractHigh-throughput DNA sequencing has enabled us to look beyond consensus reference sequences to the variation observed in sequences within organisms; their haplotypes. Recovery, or assembly of haplotypes has proved computationally difficult and there exist many probabilistic heuristics that attempt to recover the original haplotypes for a single organism of known ploidy. However, existing approaches make simplifications or assumptions that are easily violated when investigating sequence variation within a metagenome.We propose the metahaplome as the set of haplotypes for any particular genomic region of interest within a metagenomic data set and present Hansel and Gretel, a data structure and algorithm that together provide a proof of concept framework for the recovery of true haplotypes from a metagenomic data set. The algorithm performs incremental haplotype recovery, using smoothed Naive Bayes — a simple, efficient and effective method.Hansel and Gretel pose several advantages over existing solutions: the framework is capable of recovering haplotypes from metagenomes, does not require a priori knowledge about the input data, makes no assumptions regarding the distribution of alleles at variant sites, is robust to error, and uses all available evidence from aligned reads, without altering or discarding observed variation. We evaluate our approach using synthetic metahaplomes constructed from sets of real genes and show that up to 99% of SNPs on a haplotype can be correctly recovered from short reads that originate from a metagenomic data set.


2021 ◽  
Vol 58 (1) ◽  
pp. 2615-2625
Author(s):  
GANESH R

The purpose of this research paper is to identify and understand the factors influencing extent of adoption of electronic procurement software and its impact on the performance of their supply chain postuptake of the software. The paper also tries to understand the challenges companies face with respect to the adoption of E-Procurement software solutions. A survey was floated to procurement leaders and with their responses analysis was done. Statistical tools such as Exploratory Factor Analysis (EFA), descriptive statistics were used. The twonew factors that got grouped were Downstream Procurement Activities and Upstream Procurement Activities where firms adopt E-Procurement software for the Procure-to-Pay and Source-to-Contract respectively. Reliability test was successfully employed to validate the data set. The descriptive statistics tells us that the Supply Chain Performance had an above average impact by E-Procurement Adoption with a considerable fluctuation. The challenges in adopting E-Procurement were determined to be of moderate extent with cost of these procurement software solutions being a major challenge for uptake. Currently not all firms are using procurement software as one size fit all solution for their procurement management and this paper helps us understand the landscape better by grouping the penetration of electronic procurement into two factors.  


mSystems ◽  
2018 ◽  
Vol 3 (3) ◽  
Author(s):  
Luis M. Rodriguez-R ◽  
Santosh Gunturu ◽  
James M. Tiedje ◽  
James R. Cole ◽  
Konstantinos T. Konstantinidis

ABSTRACT Estimations of microbial community diversity based on metagenomic data sets are affected, often to an unknown degree, by biases derived from insufficient coverage and reference database-dependent estimations of diversity. For instance, the completeness of reference databases cannot be generally estimated since it depends on the extant diversity sampled to date, which, with the exception of a few habitats such as the human gut, remains severely undersampled. Further, estimation of the degree of coverage of a microbial community by a metagenomic data set is prohibitively time-consuming for large data sets, and coverage values may not be directly comparable between data sets obtained with different sequencing technologies. Here, we extend Nonpareil, a database-independent tool for the estimation of coverage in metagenomic data sets, to a high-performance computing implementation that scales up to hundreds of cores and includes, in addition, a k -mer-based estimation as sensitive as the original alignment-based version but about three hundred times as fast. Further, we propose a metric of sequence diversity ( N d ) derived directly from Nonpareil curves that correlates well with alpha diversity assessed by traditional metrics. We use this metric in different experiments demonstrating the correlation with the Shannon index estimated on 16S rRNA gene profiles and show that N d additionally reveals seasonal patterns in marine samples that are not captured by the Shannon index and more precise rankings of the magnitude of diversity of microbial communities in different habitats. Therefore, the new version of Nonpareil, called Nonpareil 3, advances the toolbox for metagenomic analyses of microbiomes. IMPORTANCE Estimation of the coverage provided by a metagenomic data set, i.e., what fraction of the microbial community was sampled by DNA sequencing, represents an essential first step of every culture-independent genomic study that aims to robustly assess the sequence diversity present in a sample. However, estimation of coverage remains elusive because of several technical limitations associated with high computational requirements and limiting statistical approaches to quantify diversity. Here we described Nonpareil 3, a new bioinformatics algorithm that circumvents several of these limitations and thus can facilitate culture-independent studies in clinical or environmental settings, independent of the sequencing platform employed. In addition, we present a new metric of sequence diversity based on rarefied coverage and demonstrate its use in communities from diverse ecosystems.


Sign in / Sign up

Export Citation Format

Share Document