AMON: Annotation of metabolite origins via networks to better integrate microbiome and metabolome data

ABSTRACTMotivationUntargeted metabolomics of host-associated samples has yielded insights into mechanisms by which microbes modulate health. However, data interpretation is challenged by the complexity of origins of the small molecules measured, which can come from the host, microbes that live with the host, or from other exposures such as diet or the environment.ResultsWe address this challenge through development of AMON: Annotation of Metabolite Origins via Networks. AMON is an open-source bioinformatics application that can be used to determine the degree to which annotated compounds in the metabolome may have been produced by bacteria present, the host, either (i.e. both the bacteria and host are capable of production), or neither (i.e. neither the human or the fecal microbiome are predicted to be capable of producing the observed metabolite).Availability and ImplementationThis software is available at https://github.com/lozuponelab/AMON as well as via [email protected]

Download Full-text

AMON: annotation of metabolite origins via networks to integrate microbiome and metabolome data

BMC Bioinformatics ◽

10.1186/s12859-019-3176-8 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 4

Author(s):

M. Shaffer ◽

K. Thurimella ◽

K. Quinn ◽

K. Doenges ◽

X. Zhang ◽

...

Keyword(s):

Small Molecules ◽

Kegg Pathway ◽

Data Interpretation ◽

Integrated Analysis ◽

Genomic Information ◽

Microbial Enzymes ◽

Pathway Enrichment ◽

Bioinformatics Application ◽

Metabolome Data ◽

Insight Into

Abstract Background Untargeted metabolomics of host-associated samples has yielded insights into mechanisms by which microbes modulate health. However, data interpretation is challenged by the complexity of origins of the small molecules measured, which can come from the host, microbes that live within the host, or from other exposures such as diet or the environment. Results We address this challenge through development of AMON: Annotation of Metabolite Origins via Networks. AMON is an open-source bioinformatics application that can be used to annotate which compounds in the metabolome could have been produced by bacteria present or the host, to evaluate pathway enrichment of host verses microbial metabolites, and to visualize which compounds may have been produced by host versus microbial enzymes in KEGG pathway maps. Conclusions AMON empowers researchers to predict origins of metabolites via genomic information and to visualize potential host:microbe interplay. Additionally, the evaluation of enrichment of pathway metabolites of host versus microbial origin gives insight into the metabolic functionality that a microbial community adds to a host:microbe system. Through integrated analysis of microbiome and metabolome data, mechanistic relationships between microbial communities and host phenotypes can be better understood.

Download Full-text

The variant call format provides efficient and robust storage of GWAS summary statistics

Genome Biology ◽

10.1186/s13059-020-02248-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Matthew S. Lyon ◽

Shea J. Andrews ◽

Ben Elsworth ◽

Tom R. Gaunt ◽

Gibran Hemani ◽

...

Keyword(s):

Open Access ◽

Open Source ◽

Genetic Variants ◽

Data Interpretation ◽

Summary Statistics ◽

Variant Call Format ◽

Variant Call ◽

Query Performance ◽

Link Type ◽

Storage Format

AbstractGWAS summary statistics are fundamental for a variety of research applications yet no common storage format has been widely adopted. Existing tabular formats ambiguously or incompletely store information about genetic variants and associations, lack essential metadata and are typically not indexed yielding poor query performance and increasing the possibility of errors in data interpretation and post-GWAS analyses. To address these issues, we adapted the variant call format to store GWAS summary statistics (GWAS-VCF) and developed open-source tools to use this format in downstream analyses. We provide open access to over 10,000 complete GWAS summary datasets converted to this format (https://gwas.mrcieu.ac.uk).

Download Full-text

QVigourMap: A GIS Open Source Application for the Creation of Canopy Vigour Maps

Agronomy ◽

10.3390/agronomy11050952 ◽

2021 ◽

Vol 11 (5) ◽

pp. 952

Author(s):

Lia Duarte ◽

Ana Cláudia Teodoro ◽

Joaquim J. Sousa ◽

Luís Pádua

Keyword(s):

Open Source ◽

Precision Agriculture ◽

Vegetation Indices ◽

Data Interpretation ◽

Vegetation Monitoring ◽

Distribution Maps ◽

Monitoring Process ◽

Multi Temporal ◽

The Creation ◽

The Right

In a precision agriculture context, the amount of geospatial data available can be difficult to interpret in order to understand the crop variability within a given terrain parcel, raising the need for specific tools for data processing and analysis. This is the case for data acquired from Unmanned Aerial Vehicles (UAV), in which the high spatial resolution along with data from several spectral wavelengths makes data interpretation a complex process regarding vegetation monitoring. Vegetation Indices (VIs) are usually computed, helping in the vegetation monitoring process. However, a crop plot is generally composed of several non-crop elements, which can bias the data analysis and interpretation. By discarding non-crop data, it is possible to compute the vigour distribution for a specific crop within the area under analysis. This article presents QVigourMaps, a new open source application developed to generate useful outputs for precision agriculture purposes. The application was developed in the form of a QGIS plugin, allowing the creation of vigour maps, vegetation distribution maps and prescription maps based on the combination of different VIs and height information. Multi-temporal data from a vineyard plot and a maize field were used as case studies in order to demonstrate the potential and effectiveness of the QVigourMaps tool. The presented application can contribute to making the right management decisions by providing indicators of crop variability, and the outcomes can be used in the field to apply site-specific treatments according to the levels of vigour.

Download Full-text

Open source tools for geographic analysis in transport planning

Journal of Geographical Systems ◽

10.1007/s10109-020-00342-2 ◽

2021 ◽

Author(s):

Robin Lovelace

Keyword(s):

User Interface ◽

Open Source ◽

Citizen Participation ◽

Simulation Software ◽

First Century ◽

Transport Planning ◽

Geographic Analysis ◽

Link Type ◽

Twenty First Century ◽

Interactive Map

AbstractGeographic analysis has long supported transport plans that are appropriate to local contexts. Many incumbent ‘tools of the trade’ are proprietary and were developed to support growth in motor traffic, limiting their utility for transport planners who have been tasked with twenty-first century objectives such as enabling citizen participation, reducing pollution, and increasing levels of physical activity by getting more people walking and cycling. Geographic techniques—such as route analysis, network editing, localised impact assessment and interactive map visualisation—have great potential to support modern transport planning priorities. The aim of this paper is to explore emerging open source tools for geographic analysis in transport planning, with reference to the literature and a review of open source tools that are already being used. A key finding is that a growing number of options exist, challenging the current landscape of proprietary tools. These can be classified as command-line interface, graphical user interface or web-based user interface tools and by the framework in which they were implemented, with numerous tools released as R, Python and JavaScript packages, and QGIS plugins. The review found a diverse and rapidly evolving ‘ecosystem’ tools, with 25 tools that were designed for geographic analysis to support transport planning outlined in terms of their popularity and functionality based on online documentation. They ranged in size from single-purpose tools such as the QGIS plugin AwaP to sophisticated stand-alone multi-modal traffic simulation software such as MATSim, SUMO and Veins. Building on their ability to re-use the most effective components from other open source projects, developers of open source transport planning tools can avoid ‘reinventing the wheel’ and focus on innovation, the ‘gamified’ A/B Street https://github.com/dabreegster/abstreet/#abstreet simulation software, based on OpenStreetMap, a case in point. The paper, the source code of which can be found at https://github.com/robinlovelace/open-gat, concludes that, although many of the tools reviewed are still evolving and further research is needed to understand their relative strengths and barriers to uptake, open source tools for geographic analysis in transport planning already hold great potential to help generate the strategic visions of change and evidence that is needed by transport planners in the twenty-first century.

Download Full-text

chewBBACA: A complete suite for gene-by-gene schema creation and strain identification

10.1101/173146 ◽

2017 ◽

Cited By ~ 5

Author(s):

Mickael Silva ◽

Miguel Machado ◽

Diogo N. Silva ◽

Mirko Rossi ◽

Jacob Moran-Gilad ◽

...

Keyword(s):

Open Source ◽

Core Genome ◽

Bacterial Species ◽

Outbreak Detection ◽

Strain Identification ◽

List Type ◽

Whole Genome ◽

Link Type ◽

The Creation ◽

Allele Calling

ABSTRACTGene-by-gene approaches are becoming increasingly popular in bacterial genomic epidemiology and outbreak detection. However, there is a lack of open-source scalable software for schema definition and allele calling for these methodologies. The chewBBACA suite was designed to assist users in the creation and evaluation of novel whole-genome or core-genome gene-by-gene typing schemas and subsequent allele calling in bacterial strains of interest. The software can run in a laptop or in high performance clusters making it useful for both small laboratories and large reference centers. ChewBBACA is available athttps://github.com/B-UMMI/chewBBACAor as a docker image athttps://hub.docker.com/r/ummidock/chewbbaca/.DATA SUMMARYAssembled genomes used for the tutorial were downloaded from NCBI in August 2016 by selecting those submitted asStreptococcus agalactiaetaxon or sub-taxa. All the assemblies have been deposited as a zip file in FigShare (https://figshare.com/s/9cbe1d422805db54cd52), where a file with the original ftp link for each NCBI directory is also available.Code for the chewBBACA suite is available athttps://github.com/B-UMMI/chewBBACAwhile the tutorial example is found athttps://github.com/B-UMMI/chewBBACA_tutorial.I/We confirm all supporting data, code and protocols have been provided within the article or through supplementary data files. ⊠IMPACT STATEMENTThe chewBBACA software offers a computational solution for the creation, evaluation and use of whole genome (wg) and core genome (cg) multilocus sequence typing (MLST) schemas. It allows researchers to develop wg/cgMLST schemes for any bacterial species from a set of genomes of interest. The alleles identified by chewBBACA correspond to potential coding sequences, possibly offering insights into the correspondence between the genetic variability identified and phenotypic variability. The software performs allele calling in a matter of seconds to minutes per strain in a laptop but is easily scalable for the analysis of large datasets of hundreds of thousands of strains using multiprocessing options. The chewBBACA software thus provides an efficient and freely available open source solution for gene-by-gene methods. Moreover, the ability to perform these tasks locally is desirable when the submission of raw data to a central repository or web services is hindered by data protection policies or ethical or legal concerns.

Download Full-text

The Popgen Pipeline Platform: A Software Platform for Facilitating Population Genomic Analyses

10.1101/785774 ◽

2019 ◽

Author(s):

Andrew Webb ◽

Jared Knoblauch ◽

Nitesh Sabankar ◽

Apeksha Sukesh Kallur ◽

Jody Hey ◽

...

Keyword(s):

Open Source ◽

Development Time ◽

End Users ◽

File Format ◽

Software Platform ◽

Format Conversion ◽

Link Type ◽

Population Genomic ◽

Genomic Analyses ◽

File Format Conversion

AbstractHere we present the Pop-Gen Pipeline Platform (PPP), a software platform with the goal of reducing the computational expertise required for conducting population genomic analyses. The PPP was designed as a collection of scripts that facilitate common population genomic workflows in a consistent and standardized Python environment. Functions were developed to encompass entire workflows, including: input preparation, file format conversion, various population genomic analyses, output generation, and visualization. By facilitating entire workflows, the PPP offers several benefits to prospective end users - it reduces the need of redundant in-house software and scripts that would require development time and may be error-prone, or incorrect. The platform has also been developed with reproducibility and extensibility of analyses in mind. The PPP is an open-source package that is available for download and use at https://ppp.readthedocs.io/en/latest/PPP_pages/install.html

Download Full-text

IDseq—An open source cloud-based pipeline and analysis service for metagenomic pathogen detection and monitoring

GigaScience ◽

10.1093/gigascience/giaa111 ◽

2020 ◽

Vol 9 (10) ◽

Cited By ~ 2

Author(s):

Katrina L Kalantar ◽

Tiago Carvalho ◽

Charles F A de Bourcy ◽

Boris Dimitrov ◽

Greg Dingle ◽

...

Keyword(s):

Data Analysis ◽

Open Source ◽

Web Application ◽

Pathogen Detection ◽

A Priori ◽

Virus Detection ◽

Data Interpretation ◽

Nasopharyngeal Swab ◽

Microbial Composition ◽

Detection And Identification

Abstract Background Metagenomic next-generation sequencing (mNGS) has enabled the rapid, unbiased detection and identification of microbes without pathogen-specific reagents, culturing, or a priori knowledge of the microbial landscape. mNGS data analysis requires a series of computationally intensive processing steps to accurately determine the microbial composition of a sample. Existing mNGS data analysis tools typically require bioinformatics expertise and access to local server-class hardware resources. For many research laboratories, this presents an obstacle, especially in resource-limited environments. Findings We present IDseq, an open source cloud-based metagenomics pipeline and service for global pathogen detection and monitoring (https://idseq.net). The IDseq Portal accepts raw mNGS data, performs host and quality filtration steps, then executes an assembly-based alignment pipeline, which results in the assignment of reads and contigs to taxonomic categories. The taxonomic relative abundances are reported and visualized in an easy-to-use web application to facilitate data interpretation and hypothesis generation. Furthermore, IDseq supports environmental background model generation and automatic internal spike-in control recognition, providing statistics that are critical for data interpretation. IDseq was designed with the specific intent of detecting novel pathogens. Here, we benchmark novel virus detection capability using both synthetically evolved viral sequences and real-world samples, including IDseq analysis of a nasopharyngeal swab sample acquired and processed locally in Cambodia from a tourist from Wuhan, China, infected with the recently emergent SARS-CoV-2. Conclusion The IDseq Portal reduces the barrier to entry for mNGS data analysis and enables bench scientists, clinicians, and bioinformaticians to gain insight from mNGS datasets for both known and novel pathogens.

Download Full-text

An open RNA-Seq data analysis pipeline tutorial with an example of reprocessing data from a recent Zika virus study

F1000Research ◽

10.12688/f1000research.9110.1 ◽

2016 ◽

Vol 5 ◽

pp. 1574 ◽

Cited By ~ 19

Author(s):

Zichen Wang ◽

Avi Ma'ayan

Keyword(s):

Small Molecules ◽

Zika Virus ◽

Principal Component ◽

Global Gene Expression ◽

Brain Morphology ◽

Rna Seq ◽

Link Type ◽

Neuronal Progenitors ◽

Global Gene Expression Profiling ◽

Data Files

RNA-seq analysis is becoming a standard method for global gene expression profiling. However, open and standard pipelines to perform RNA-seq analysis by non-experts remain challenging due to the large size of the raw data files and the hardware requirements for running the alignment step. Here we introduce a reproducible open source RNA-seq pipeline delivered as an IPython notebook and a Docker image. The pipeline uses state-of-the-art tools and can run on various platforms with minimal configuration overhead. The pipeline enables the extraction of knowledge from typical RNA-seq studies by generating interactive principal component analysis (PCA) and hierarchical clustering (HC) plots, performing enrichment analyses against over 90 gene set libraries, and obtaining lists of small molecules that are predicted to either mimic or reverse the observed changes in mRNA expression. We apply the pipeline to a recently published RNA-seq dataset collected from human neuronal progenitors infected with the Zika virus (ZIKV). In addition to confirming the presence of cell cycle genes among the genes that are downregulated by ZIKV, our analysis uncovers significant overlap with upregulated genes that when knocked out in mice induce defects in brain morphology. This result potentially points to the molecular processes associated with the microcephaly phenotype observed in newborns from pregnant mothers infected with the virus. In addition, our analysis predicts small molecules that can either mimic or reverse the expression changes induced by ZIKV. The IPython notebook and Docker image are freely available at: http://nbviewer.jupyter.org/github/maayanlab/Zika-RNAseq-Pipeline/blob/master/Zika.ipynb and https://hub.docker.com/r/maayanlab/zika/.

Download Full-text

Dashing: fast and accurate genomic distances with HyperLogLog

Genome Biology ◽

10.1186/s13059-019-1875-0 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 9

Author(s):

Daniel N. Baker ◽

Ben Langmead

Keyword(s):

Open Source ◽

Software Tool ◽

Estimation Methods ◽

Cardinality Estimation ◽

Link Type ◽

Wide Range

AbstractDashing is a fast and accurate software tool for estimating similarities of genomes or sequencing datasets. It uses the HyperLogLog sketch together with cardinality estimation methods that are specialized for set unions and intersections. Dashing summarizes genomes more rapidly than previous MinHash-based methods while providing greater accuracy across a wide range of input sizes and sketch sizes. It can sketch and calculate pairwise distances for over 87K genomes in 6 minutes. Dashing is open source and available at https://github.com/dnbaker/dashing.

Download Full-text

Balance Trees Reveal Microbial Niche Differentiation

mSystems ◽

10.1128/msystems.00162-16 ◽

2017 ◽

Vol 2 (1) ◽

Cited By ~ 129

Author(s):

James T. Morton ◽

Jon Sanders ◽

Robert A. Quinn ◽

Daniel McDonald ◽

Antonio Gonzalez ◽

...

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Open Source ◽

Niche Differentiation ◽

Difficult Problem ◽

Individual Species ◽

Rrna Gene ◽

Link Type ◽

Open Source License ◽

Gene Data

ABSTRACT By explicitly accounting for the compositional nature of 16S rRNA gene data through the concept of balances, balance trees yield novel biological insights into niche differentiation. The software to perform this analysis is available under an open-source license and can be obtained at https://github.com/biocore/gneiss . Advances in sequencing technologies have enabled novel insights into microbial niche differentiation, from analyzing environmental samples to understanding human diseases and informing dietary studies. However, identifying the microbial taxa that differentiate these samples can be challenging. These issues stem from the compositional nature of 16S rRNA gene data (or, more generally, taxon or functional gene data); the changes in the relative abundance of one taxon influence the apparent abundances of the others. Here we acknowledge that inferring properties of individual bacteria is a difficult problem and instead introduce the concept of balances to infer meaningful properties of subcommunities, rather than properties of individual species. We show that balances can yield insights about niche differentiation across multiple microbial environments, including soil environments and lung sputum. These techniques have the potential to reshape how we carry out future ecological analyses aimed at revealing differences in relative taxonomic abundances across different samples. IMPORTANCE By explicitly accounting for the compositional nature of 16S rRNA gene data through the concept of balances, balance trees yield novel biological insights into niche differentiation. The software to perform this analysis is available under an open-source license and can be obtained at https://github.com/biocore/gneiss . Author Video: An author video summary of this article is available.

Download Full-text