scholarly journals Ontology-Enriched Specifications Enabling Findable, Accessible, Interoperable, and Reusable Marine Metagenomic Datasets in Cyberinfrastructure Systems

2021 ◽  
Vol 12 ◽  
Author(s):  
Kai L. Blumberg ◽  
Alise J. Ponsero ◽  
Matthew Bomhoff ◽  
Elisha M. Wood-Charlson ◽  
Edward F. DeLong ◽  
...  

Marine microbial ecology requires the systematic comparison of biogeochemical and sequence data to analyze environmental influences on the distribution and variability of microbial communities. With ever-increasing quantities of metagenomic data, there is a growing need to make datasets Findable, Accessible, Interoperable, and Reusable (FAIR) across diverse ecosystems. FAIR data is essential to developing analytical frameworks that integrate microbiological, genomic, ecological, oceanographic, and computational methods. Although community standards defining the minimal metadata required to accompany sequence data exist, they haven’t been consistently used across projects, precluding interoperability. Moreover, these data are not machine-actionable or discoverable by cyberinfrastructure systems. By making ‘omic and physicochemical datasets FAIR to machine systems, we can enable sequence data discovery and reuse based on machine-readable descriptions of environments or physicochemical gradients. In this work, we developed a novel technical specification for dataset encapsulation for the FAIR reuse of marine metagenomic and physicochemical datasets within cyberinfrastructure systems. This includes using Frictionless Data Packages enriched with terminology from environmental and life-science ontologies to annotate measured variables, their units, and the measurement devices used. This approach was implemented in Planet Microbe, a cyberinfrastructure platform and marine metagenomic web-portal. Here, we discuss the data properties built into the specification to make global ocean datasets FAIR within the Planet Microbe portal. We additionally discuss the selection of, and contributions to marine-science ontologies used within the specification. Finally, we use the system to discover data by which to answer various biological questions about environments, physicochemical gradients, and microbial communities in meta-analyses. This work represents a future direction in marine metagenomic research by proposing a specification for FAIR dataset encapsulation that, if adopted within cyberinfrastructure systems, would automate the discovery, exchange, and re-use of data needed to answer broader reaching questions than originally intended.

F1000Research ◽  
2019 ◽  
Vol 8 ◽  
pp. 726
Author(s):  
Mike W.C. Thang ◽  
Xin-Yi Chua ◽  
Gareth Price ◽  
Dominique Gorse ◽  
Matt A. Field

Metagenomic sequencing is an increasingly common tool in environmental and biomedical sciences.  While software for detailing the composition of microbial communities using 16S rRNA marker genes is relatively mature, increasingly researchers are interested in identifying changes exhibited within microbial communities under differing environmental conditions. In order to gain maximum value from metagenomic sequence data we must improve the existing analysis environment by providing accessible and scalable computational workflows able to generate reproducible results. Here we describe a complete end-to-end open-source metagenomics workflow running within Galaxy for 16S differential abundance analysis. The workflow accepts 454 or Illumina sequence data (either overlapping or non-overlapping paired end reads) and outputs lists of the operational taxonomic unit (OTUs) exhibiting the greatest change under differing conditions. A range of analysis steps and graphing options are available giving users a high-level of control over their data and analyses. Additionally, users are able to input complex sample-specific metadata information which can be incorporated into differential analysis and used for grouping / colouring within graphs.  Detailed tutorials containing sample data and existing workflows are available for three different input types: overlapping and non-overlapping read pairs as well as for pre-generated Biological Observation Matrix (BIOM) files. Using the Galaxy platform we developed MetaDEGalaxy, a complete metagenomics differential abundance analysis workflow. MetaDEGalaxy is designed for bench scientists working with 16S data who are interested in comparative metagenomics.  MetaDEGalaxy builds on momentum within the wider Galaxy metagenomics community with the hope that more tools will be added as existing methods mature.


F1000Research ◽  
2019 ◽  
Vol 8 ◽  
pp. 726 ◽  
Author(s):  
Mike W.C. Thang ◽  
Xin-Yi Chua ◽  
Gareth Price ◽  
Dominique Gorse ◽  
Matt A. Field

Metagenomic sequencing is an increasingly common tool in environmental and biomedical sciences yet analysis workflows remain immature relative to other field such as DNASeq and RNASeq analysis pipelines.  While software for detailing the composition of microbial communities using 16S rRNA marker genes is constantly improving, increasingly researchers are interested in identifying changes exhibited within microbial communities under differing environmental conditions. In order to gain maximum value from metagenomic sequence data we must improve the existing analysis environment by providing accessible and scalable computational workflows able to generate reproducible results. Here we describe a complete end-to-end open-source metagenomics workflow running within Galaxy for 16S differential abundance analysis. The workflow accepts 454 or Illumina sequence data (either overlapping or non-overlapping paired end reads) and outputs lists of the operational taxonomic unit (OTUs) exhibiting the greatest change under differing conditions. A range of analysis steps and graphing options are available giving users a high-level of control over their data and analyses. Additionally, users are able to input complex sample-specific metadata information which can be incorporated into differential analysis and used for grouping / colouring within graphs.  Detailed tutorials containing sample data and existing workflows are available for three different input types: overlapping and non-overlapping read pairs as well as for pre-generated Biological Observation Matrix (BIOM) files. Using the Galaxy platform we developed MetaDEGalaxy, a complete metagenomics differential abundance analysis workflow. MetaDEGalaxy is designed for bench scientists working with 16S data who are interested in comparative metagenomics.  MetaDEGalaxy builds on momentum within the wider Galaxy metagenomics community with the hope that more tools will be added as existing methods mature.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Kazutoshi Yoshitake ◽  
Gaku Kimura ◽  
Tomoko Sakami ◽  
Tsuyoshi Watanabe ◽  
Yukiko Taniuchi ◽  
...  

AbstractAlthough numerous metagenome, amplicon sequencing-based studies have been conducted to date to characterize marine microbial communities, relatively few have employed full metagenome shotgun sequencing to obtain a broader picture of the functional features of these marine microbial communities. Moreover, most of these studies only performed sporadic sampling, which is insufficient to understand an ecosystem comprehensively. In this study, we regularly conducted seawater sampling along the northeastern Pacific coast of Japan between March 2012 and May 2016. We collected 213 seawater samples and prepared size-based fractions to generate 454 subsets of samples for shotgun metagenome sequencing and analysis. We also determined the sequences of 16S rRNA (n = 111) and 18S rRNA (n = 47) gene amplicons from smaller sample subsets. We thereafter developed the Ocean Monitoring Database for time-series metagenomic data (http://marine-meta.healthscience.sci.waseda.ac.jp/omd/), which provides a three-dimensional bird’s-eye view of the data. This database includes results of digital DNA chip analysis, a novel method for estimating ocean characteristics such as water temperature from metagenomic data. Furthermore, we developed a novel classification method that includes more information about viruses than that acquired using BLAST. We further report the discovery of a large number of previously overlooked (TAG)n repeat sequences in the genomes of marine microbes. We predict that the availability of this time-series database will lead to major discoveries in marine microbiome research.


2021 ◽  
Author(s):  
Jinglie Zhou ◽  
Susanna M. Theroux ◽  
Clifton P. Bueno de Mesquita ◽  
Wyatt H. Hartman ◽  
Ye Tian ◽  
...  

AbstractWetlands are important carbon (C) sinks, yet many have been destroyed and converted to other uses over the past few centuries, including industrial salt making. A renewed focus on wetland ecosystem services (e.g., flood control, and habitat) has resulted in numerous restoration efforts whose effect on microbial communities is largely unexplored. We investigated the impact of restoration on microbial community composition, metabolic functional potential, and methane flux by analyzing sediment cores from two unrestored former industrial salt ponds, a restored former industrial salt pond, and a reference wetland. We observed elevated methane emissions from unrestored salt ponds compared to the restored and reference wetlands, which was positively correlated with salinity and sulfate across all samples. 16S rRNA gene amplicon and shotgun metagenomic data revealed that the restored salt pond harbored communities more phylogenetically and functionally similar to the reference wetland than to unrestored ponds. Archaeal methanogenesis genes were positively correlated with methane flux, as were genes encoding enzymes for bacterial methylphosphonate degradation, suggesting methane is generated both from bacterial methylphosphonate degradation and archaeal methanogenesis in these sites. These observations demonstrate that restoration effectively converted industrial salt pond microbial communities back to compositions more similar to reference wetlands and lowered salinities, sulfate concentrations, and methane emissions.


2018 ◽  
Vol 35 (13) ◽  
pp. 2332-2334 ◽  
Author(s):  
Federico Baldini ◽  
Almut Heinken ◽  
Laurent Heirendt ◽  
Stefania Magnusdottir ◽  
Ronan M T Fleming ◽  
...  

Abstract Motivation The application of constraint-based modeling to functionally analyze metagenomic data has been limited so far, partially due to the absence of suitable toolboxes. Results To address this gap, we created a comprehensive toolbox to model (i) microbe–microbe and host–microbe metabolic interactions, and (ii) microbial communities using microbial genome-scale metabolic reconstructions and metagenomic data. The Microbiome Modeling Toolbox extends the functionality of the constraint-based reconstruction and analysis toolbox. Availability and implementation The Microbiome Modeling Toolbox and the tutorials at https://git.io/microbiomeModelingToolbox.


2012 ◽  
Vol 78 (15) ◽  
pp. 5288-5296 ◽  
Author(s):  
Yu-Wei Wu ◽  
Mina Rho ◽  
Thomas G. Doak ◽  
Yuzhen Ye

ABSTRACTThe NIH Human Microbiome Project (HMP) has produced several hundred metagenomic data sets, allowing studies of the many functional elements in human-associated microbial communities. Here, we survey the distribution of oral spirochetes implicated in dental diseases in normal human individuals, using recombination sites associated with the chromosomal integron inTreponemagenomes, taking advantage of the multiple copies of the integron recombination sites (repeats) in the genomes, and using a targeted assembly approach that we have developed. We find that integron-containingTreponemaspecies are present in ∼80% of the normal human subjects included in the HMP. Further, we are able tode novoassemble the integron gene cassettes using our constrained assembly approach, which employs a unique application of the de Bruijn graph assembly information; most of these cassette genes were not assembled in whole-metagenome assemblies and could not be identified by mapping sequencing reads onto the known referenceTreponemagenomes due to the dynamic nature of integron gene cassettes. Our study significantly enriches the gene pool known to be carried byTreponemachromosomal integrons, totaling 826 (598 97% nonredundant) genes. We characterize the functions of these gene cassettes: many of these genes have unknown functions. The integron gene cassette arrays found in the human microbiome are extraordinarily dynamic, with different microbial communities sharing only a small number of common genes.


F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 2060
Author(s):  
Aleksandr Agafonov ◽  
Kimmo Mattila ◽  
Cuong Duong Tuan ◽  
Lars Tiede ◽  
Inge Alexander Raknes ◽  
...  

META-pipe is a complete service for the analysis of marine metagenomic data. It provides assembly of high-throughput sequence data, functional annotation of predicted genes, and taxonomic profiling. The functional annotation is computationally demanding and is therefore currently run on a high-performance computing cluster in Norway. However, additional compute resources are necessary to open the service to all ELIXIR users. We describe our approach for setting up and executing the functional analysis of META-pipe on additional academic and commercial clouds. Our goal is to provide a powerful analysis service that is easy to use and to maintain. Our design therefore uses a distributed architecture where we combine central servers with multiple distributed backends that execute the computationally intensive jobs. We believe our experiences developing and operating META-pipe provides a useful model for others that plan to provide a portal based data analysis service in ELIXIR and other organizations with geographically distributed compute and storage resources.


2018 ◽  
Author(s):  
Arghavan Bahadorinejad ◽  
Ivan Ivanov ◽  
Johanna W Lampe ◽  
Meredith AJ Hullar ◽  
Robert S Chapkin ◽  
...  

AbstractWe propose a Bayesian method for the classification of 16S rRNA metagenomic profiles of bacterial abundance, by introducing a Poisson-Dirichlet-Multinomial hierarchical model for the sequencing data, constructing a prior distribution from sample data, calculating the posterior distribution in closed form; and deriving an Optimal Bayesian Classifier (OBC). The proposed algorithm is compared to state-of-the-art classification methods for 16S rRNA metagenomic data, including Random Forests and the phylogeny-based Metaphyl algorithm, for varying sample size, classification difficulty, and dimensionality (number of OTUs), using both synthetic and real metagenomic data sets. The results demonstrate that the proposed OBC method, with either noninformative or constructed priors, is competitive or superior to the other methods. In particular, in the case where the ratio of sample size to dimensionality is small, it was observed that the proposed method can vastly outperform the others.Author summaryRecent studies have highlighted the interplay between host genetics, gut microbes, and colorectal tumor initiation/progression. The characterization of microbial communities using metagenomic profiling has therefore received renewed interest. In this paper, we propose a method for classification, i.e., prediction of different outcomes, based on 16S rRNA metagenomic data. The proposed method employs a Bayesian approach, which is suitable for data sets with small ration of number of available instances to the dimensionality. Results using both synthetic and real metagenomic data show that the proposed method can outperform other state-of-the-art metagenomic classification algorithms.


2018 ◽  
Author(s):  
Ramiro Logares ◽  
Ina M. Deutschmann ◽  
Caterina. R. Giner ◽  
Anders K. Krabberød ◽  
Thomas S. B. Schmidt ◽  
...  

ABSTRACTThe smallest members of the sunlit-ocean microbiome (prokaryotes and picoeukaryotes) participate in a plethora of ecosystem functions with planetary-scale effects. Understanding the processes determining the spatial turnover of this assemblage can help us better comprehend the links between microbiome species composition and ecosystem function. Ecological theory predicts thatselection,dispersalanddriftare main drivers of species distributions, yet, the relative quantitative importance of these ecological processes in structuring the surface-ocean microbiome is barely known. Here we quantified the role of selection, dispersal and drift in structuring surface-ocean prokaryotic and picoeukaryotic assemblages by using community DNA-sequence data collected during the global Malaspina expedition. We found that dispersal limitation was the dominant process structuring picoeukaryotic communities, while a balanced combination of dispersal limitation, selection and drift shaped prokaryotic counterparts. Subsequently, we determined the agents exerting abiotic selection as well as the spatial patterns emerging from the action of different ecological processes. We found that selection exerted via temperature had a strong influence on the structure of prokaryotic communities, particularly on species co-occurrences, a pattern not observed among communities of picoeukaryotes. Other measured abiotic variables had limited selective effects on microbiome structure. Picoeukaryotes presented a higher differentiation between neighbouring communities and a higher distance-decay when compared to prokaryotes, agreeing with their higher dispersal limitation. Finally, drift seemed to have a limited role in structuring the sunlit-ocean microbiome. The different predominance of ecological processes acting on particular subsets of the ocean microbiome suggests uneven responses to environmental change.SIGNIFICANCE STATEMENTThe global ocean contains one of the largest microbiomes on Earth and changes on its structure can impact the functioning of the biosphere. Yet, we are far from understanding the mechanisms that structure the global ocean microbiome, that is, the relative importance of environmentalselection,dispersaland random events (drift). We evaluated the role of these processes at the global scale, based on data derived from a circumglobal expedition and found that these ecological processes act differently on prokaryotes and picoeukaryotes, two of the main components of the ocean microbiome. Our work represents a significant contribution to understand the assembly of marine microbial communities, providing also insights on the links between ecological mechanisms, microbiome structure and ecosystem function.


2019 ◽  
Vol 3 ◽  
Author(s):  
Shruthi Magesh ◽  
Viktor Jonsson ◽  
Johan Bengtsson-Palme

Metagenomics has emerged as a central technique for studying the structure and function of microbial communities. Often the functional analysis is restricted to classification into broad functional categories. However, important phenotypic differences, such as resistance to antibiotics, are often the result of just one or a few point mutations in otherwise identical sequences. Bioinformatic methods for metagenomic analysis have generally been poor at accounting for this fact, resulting in a somewhat limited picture of important aspects of microbial communities. Here, we address this problem by providing a software tool called Mumame, which can distinguish between wildtype and mutated sequences in shotgun metagenomic data and quantify their relative abundances. We demonstrate the utility of the tool by quantifying antibiotic resistance mutations in several publicly available metagenomic data sets. We also identified that sequencing depth is a key factor to detect rare mutations. Therefore, much larger numbers of sequences may be required for reliable detection of mutations than for most other applications of shotgun metagenomics. Mumame is freely available online (http://microbiology.se/software/mumame).


Sign in / Sign up

Export Citation Format

Share Document