Proxi: a Python package for proximity network inference from metagenomic data

Metagenomic Data ◽

Metagenomic Sequencing ◽

Proximity Graphs ◽

Link Type ◽

Technological Advances ◽

AbstractSummary: Recent technological advances in high-throughput metagenomic sequencing have provided unique opportunities for studying the diversity and dynamics of microbial communities under different health or environmental conditions. Graph-based representation of metagenomic data is a promising direction not only for analyzing microbial interactions but also for a broad range of machine learning tasks including feature selection, classification, clustering, anomaly detection, and dimensionality reduction. We present Proxi, an open source Python package for learning different types of proximity graphs from metagenomic data. Currently, three types of proximity graphs are supported: k-nearest neighbor (k-NN) graphs; radius-nearest neighbor (r-NN) graphs; and perturbed k-nearest neighbor (pk-NN) graphs.Availability: Proxi Python source code is freely available at https://bitbucket.org/idsrlab/proxi/.Contact:[email protected] information: Tutorials and online documentation are available at https://proxi.readthedocs.io

PyRice: a Python package for querying Oryza Sativa databases

10.1101/2020.04.20.049742 ◽

2020 ◽

Author(s):

Quan Do ◽

Ho Bich Hai ◽

Pierre Larmande

Keyword(s):

Oryza Sativa ◽

Heterogeneous Data ◽

Web Based ◽

Domain Experts ◽

Link Type ◽

Heterogeneous Data Sources ◽

Query System ◽

Gene Information ◽

AbstractSummaryCurrently, gene information available for Oryza sativa species is located in various online heterogeneous data sources. Moreover, methods of access are also diverse, mostly web-based and sometimes query APIs, which might not always be straightforward for domain experts. The challenge is to collect information quickly from these applications and combine it logically, to facilitate scientific research. We developed a Python package named PyRice, a unified programming API to access all supported databases at the same time with consistent output. PyRice design is modular and implements a smart query system which fits the computing resources to optimize the query speed. As a result, PyRice is easy to use and produces intuitive results.Availability and implementationhttps://github.com/SouthGreenPlatform/PyRiceDocumentationhttps://[email protected] informationMITSupplementary informationSupplementary data are available online.

BEEM-Static: Accurate inference of ecological interactions from cross-sectional metagenomic data

10.1101/2020.11.23.394999 ◽

2020 ◽

Author(s):

Chenhao Li ◽

Tamar V. Av-Shalom ◽

Jun Wei Gerald Tan ◽

Junmei Samantha Kwah ◽

Kern Rei Chng ◽

...

Keyword(s):

Ecological Model ◽

Expectation Maximization Algorithm ◽

Microbial Interactions ◽

Metagenomic Data ◽

Metagenomic Sequencing ◽

Ecological Interactions ◽

Cross Sectional ◽

Key Species ◽

Alternate Model ◽

And Function

AbstractMotivationThe structure and function of diverse microbial communities is underpinned by ecological interactions that remain uncharacterized. With rapid adoption of metagenomic sequencing for studying microbiomes, data-driven inference of microbial interactions based on abundance correlations is widely used, but with the drawback that ecological interpretations may not be possible. Leveraging cross-sectional metagenomic datasets for unravelling ecological structure in a scalable manner thus remains an open problem.MethodsWe present an expectation-maximization algorithm (BEEM-Static) that can be applied to cross-sectional datasets to infer interaction networks based on an ecological model (generalized Lotka-Volterra). The method exhibits robustness to violations in model assumptions by using statistical filters to identify and remove corresponding samples.ResultsBenchmarking against 10 state-of-the-art correlation based methods showed that BEEM-Static can infer presence and directionality of ecological interactions even with relative abundance data (AUC-ROC>0.85), a task that other methods struggle with (AUC-ROC<0.63). In addition, BEEM-Static can tolerate a high fraction of samples (up to 40%) being not at steady state or coming from an alternate model. Applying BEEM-Static to a large public dataset of human gut microbiomes (n=4,617) identified multiple stable equilibria that better reflect ecological enterotypes with distinct carrying capacities and interactions for key species.ConclusionBEEM-Static provides new opportunities for mining ecologically interpretable interactions and systems insights from the growing corpus of metagenomic data.

The Microbiome Modeling Toolbox: from microbial interactions to personalized microbial communities

10.1101/318485 ◽

2018 ◽

Cited By ~ 2

Author(s):

Federico Baldini ◽

Almut Heinken ◽

Laurent Heirendt ◽

Stefania Magnusdottir ◽

Ronan M.T. Fleming ◽

...

Keyword(s):

Microbial Communities ◽

Microbial Interactions ◽

Microbial Genome ◽

Metagenomic Data ◽

Link Type ◽

Metabolic Interactions ◽

Constraint Based Modeling ◽

Genome Scale ◽

Cobra Toolbox

MotivationThe application of constraint-based modeling to functionally analyze metagenomic data has been limited so far, partially due to the absence of suitable toolboxes.ResultsTo address this shortage, we created a comprehensive toolbox to model i) microbe-microbe and host-microbe metabolic interactions, and ii) microbial communities using microbial genome-scale metabolic reconstructions and metagenomic data. The Microbiome Modeling Toolbox extends the functionality of the COBRA Toolbox.AvailabilityThe Microbiome Modeling Toolbox and the tutorials at https://git.io/microbiomeModelingToolbox.

Controlling false discoveries in Bayesian gene networks with lasso regression p-values

10.1101/288217 ◽

2018 ◽

Cited By ~ 1

Author(s):

Lingfei Wang ◽

Tom Michoel

Keyword(s):

Bayesian Networks ◽

Gene Networks ◽

Network Inference ◽

Lasso Regression ◽

Link Type ◽

False Discovery ◽

Systematic Biases ◽

Empirical Tests ◽

False Discoveries

AbstractMotivationBayesian networks can represent directed gene regulations and therefore are favored over co-expression networks. However, hardly any Bayesian network study concerns the false discovery control (FDC) of network edges, leading to low accuracies due to systematic biases from inconsistent false discovery levels in the same study.ResultsWe design four empirical tests to examine the FDC of Bayesian networks from three p-value based lasso regression variable selections — two existing and one we originate. Our method, lassopv, computes p-values for the critical regularization strength at which a predictor starts to contribute to lasso regression. Using null and Geuvadis datasets, we find that lassopv obtains optimal FDC in Bayesian gene networks, whilst existing methods have defective p-values. The FDC concept and tests extend to most network inference scenarios and will guide the design and improvement of new and existing methods. Our novel variable selection method with lasso regression also allows FDC on other datasets and questions, even beyond network inference and computational biology.AvailabilityLassopv is implemented in R and freely available at https://github.com/lingfeiwang/lassopv and https://cran.r-project.org/[email protected] informationSupplementary data are available at Bioinformatics online.

dms2dfe: Comprehensive Workflow for Analysis of Deep Mutational Scanning Data

10.1101/072645 ◽

2016 ◽

Cited By ~ 2

Author(s):

Rohan Dandage ◽

Kausik Chakraborty

Keyword(s):

Noise Reduction ◽

High Throughput ◽

Critical Issue ◽

Supplementary Data ◽

Selection Pressures ◽

Link Type ◽

Supplementary Material ◽

End To End ◽

SummaryHigh throughput genotype to phenotype (G2P) data is increasingly being generated by widely applicable Deep Mutational Scanning (DMS) method. dms2dfe is a comprehensive end-to-end workflow that addresses critical issue with noise reduction and offers variety of crucial downstream analyses. Noise reduction is carried out by normalizing counts of mutants by depth of sequencing and subsequent dispersion shrinkage at the level of calculation of preferential enrichments. In downstream analyses, dms2dfe workflow provides identification of relative selection pressures, potential molecular constraints and generation of data-rich visualizations.Availabilitydms2dfe is implemented as a python package and it is available at https://kc-lab.github.io/[email protected], [email protected] informationSupplementary data are available at Bioinformatics online.

Python Interfaces for the Smoldyn Simulator

10.1101/2020.12.15.422958 ◽

2020 ◽

Author(s):

Dilawar Singh ◽

Steven S. Andrews

Keyword(s):

Systems Biology ◽

Open Source ◽

Object Oriented ◽

Software Tools ◽

Low Level ◽

Link Type ◽

Programming Interface ◽

AbstractMotivationSmoldyn is a particle-based biochemical simulator that is frequently used for systems biology and biophysics research. Previously, users could only define models using text-based input or a C/C++ applicaton programming interface (API), which were convenient, but limited extensibility.ResultsWe added a Python API to Smoldyn to improve integration with other software tools such as Jupyter notebooks, other Python code libraries, and other simulators. It includes low-level functions that closely mimic the existing C/C++ API and higher-level functions that are more convenient to use. These latter functions follow modern object-oriented Python conventions.AvailabilitySmoldyn is open source and free, available athttp://www.smoldyn.org, and can be installed with the Python package managerpip. It runs on Mac, Windows, and [email protected] informationDocumentation is available athttp://www.smoldyn.organdhttps://smoldyn.readthedocs.io.

Metaviral SPAdes: assembly of viruses from metagenomic data

Bioinformatics ◽

10.1093/bioinformatics/btaa490 ◽

2020 ◽

Vol 36 (14) ◽

pp. 4126-4129 ◽

Cited By ~ 8

Author(s):

Dmitry Antipov ◽

Mikhail Raiko ◽

Alla Lapidus ◽

Pavel A Pevzner

Keyword(s):

Markov Models ◽

State Of The Art ◽

Metagenomic Data ◽

Metagenomic Sequencing ◽

Coverage Depth ◽

Viral Genomes ◽

Shotgun Metagenomic Sequencing ◽

Bacterial Chromosomes ◽

Metagenomic Assembly

Abstract Motivation Although the set of currently known viruses has been steadily expanding, only a tiny fraction of the Earth’s virome has been sequenced so far. Shotgun metagenomic sequencing provides an opportunity to reveal novel viruses but faces the computational challenge of identifying viral genomes that are often difficult to detect in metagenomic assemblies. Results We describe a MetaviralSPAdes tool for identifying viral genomes in metagenomic assembly graphs that is based on analyzing variations in the coverage depth between viruses and bacterial chromosomes. We benchmarked MetaviralSPAdes on diverse metagenomic datasets, verified our predictions using a set of virus-specific Hidden Markov Models and demonstrated that it improves on the state-of-the-art viral identification pipelines. Availability and implementation Metaviral SPAdes includes ViralAssembly, ViralVerify and ViralComplete modules that are available as standalone packages: https://github.com/ablab/spades/tree/metaviral_publication, https://github.com/ablab/viralVerify/ and https://github.com/ablab/viralComplete/. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

grabseqs: simple downloading of reads and metadata from multiple next-generation sequencing data repositories

Bioinformatics ◽

10.1093/bioinformatics/btaa167 ◽

2020 ◽

Vol 36 (11) ◽

pp. 3607-3609

Author(s):

Louis J Taylor ◽

Arwa Abbas ◽

Frederic D Bushman

Keyword(s):

High Throughput Sequencing ◽

Source Code ◽

Metagenomic Data ◽

Next Generation Sequencing Data ◽

Sequencing Data ◽

Data Repositories ◽

Sequence Read Archive ◽

Python Package ◽

Generation Sequencing

Abstract Summary High-throughput sequencing is a powerful technique for addressing biological questions. Grabseqs streamlines access to publicly available metagenomic data by providing a single, easy-to-use interface to download data and metadata from multiple repositories, including the Sequence Read Archive, the Metagenomics Rapid Annotation through Subsystems Technology server and iMicrobe. Users can download data and metadata in a standardized format from any number of samples or projects from a given repository with a single grabseqs command. Availability and implementation Grabseqs is an open-source tool implemented in Python and licensed under the MIT license. The source code is freely available at https://github.com/louiejtaylor/grabseqs, the Python Package Index and Anaconda Cloud repository. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

ORT: A workflow linking genome-scale metabolic models with reactive transport codes

10.1101/2021.03.02.433463 ◽

2021 ◽

Author(s):

Rebecca L. Rubinstein ◽

Mikayla A. Borton ◽

Haiyan Zhou ◽

Michael Shaffer ◽

David W. Hoyt ◽

...

Keyword(s):

Simple Model ◽

Reactive Transport ◽

River System ◽

Metagenomic Data ◽

Test Case ◽

Modeling Tools ◽

Natural Systems ◽

Link Type ◽

Genome Scale

AbstractMotivationAdvanced modeling tools are available for ‘omics-based metabolic modeling and for reactive transport modeling, but there is a disconnect between these methods, which hinders linking models across scales. Microbial processes strongly impact many natural systems, and so better capture of microbial dynamics could greatly improve simulations of these systems.ResultsOur approach, ORT, applied to environmental metagenomic data from a river system predicted nitrogen cycling patterns with site-specific insight into chemical and biological drivers of nitrification and denitrification processes.Availability and ImplementationLive interactive models are available at https://pflotranmodeling.paf.subsurfaceinsights.com/pflotran-simple-model/. Microbiological data is available at NCBI via BioProject ID PRJNA576070. The code for ORT (written in Python 3) is available at https://github.com/subsurfaceinsights. The KBase narrative used for the test case is publicly available at https://narrative.kbase.us/narrative/[email protected] or [email protected] informationSupplementary data are available online.

Identification of a New Antimicrobial, Desertomycin H, Utilizing a Modified Crowded Plate Technique

Marine Drugs ◽

10.3390/md19080424 ◽

2021 ◽

Vol 19 (8) ◽

pp. 424

Author(s):

Osama G. Mohamed ◽

Sadaf Dorandish ◽

Rebecca Lindow ◽

Megan Steltz ◽

Ifrah Shoukat ◽

...

Keyword(s):

Antibiotic Production ◽

Gene Clusters ◽

Multidrug Resistant ◽

Microbial Interactions ◽

Mass Spectrometry Data ◽

Metagenomic Data ◽

Resistant Bacteria ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Plate Technique

The antibiotic-resistant bacteria-associated infections are a major global healthcare threat. New classes of antimicrobial compounds are urgently needed as the frequency of infections caused by multidrug-resistant microbes continues to rise. Recent metagenomic data have demonstrated that there is still biosynthetic potential encoded in but transcriptionally silent in cultivatable bacterial genomes. However, the culture conditions required to identify and express silent biosynthetic gene clusters that yield natural products with antimicrobial activity are largely unknown. Here, we describe a new antibiotic discovery scheme, dubbed the modified crowded plate technique (mCPT), that utilizes complex microbial interactions to elicit antimicrobial production from otherwise silent biosynthetic gene clusters. Using the mCPT as part of the antibiotic crowdsourcing educational program Tiny Earth®, we isolated over 1400 antibiotic-producing microbes, including 62, showing activity against multidrug-resistant pathogens. The natural product extracts generated from six microbial isolates showed potent activity against vancomycin-intermediate resistant Staphylococcus aureus. We utilized a targeted approach that coupled mass spectrometry data with bioactivity, yielding a new macrolactone class of metabolite, desertomycin H. In this study, we successfully demonstrate a concept that significantly increased our ability to quickly and efficiently identify microbes capable of the silent antibiotic production.