scholarly journals Hierarchical Microbial Functions Prediction by Graph Aggregated Embedding

2021 ◽  
Vol 11 ◽  
Author(s):  
Yujie Hou ◽  
Xiong Zhang ◽  
Qinyan Zhou ◽  
Wenxing Hong ◽  
Ying Wang

Matching 16S rRNA gene sequencing data to a metabolic reference database is a meaningful way to predict the metabolic function of bacteria and archaea, bringing greater insight to the working of the microbial community. However, some operational taxonomy units (OTUs) cannot be functionally profiled, especially for microbial communities from non-human samples cultured in defective media. Therefore, we herein report the development of Hierarchical micrObial functions Prediction by graph aggregated Embedding (HOPE), which utilizes co-occurring patterns and nucleotide sequences to predict microbial functions. HOPE integrates topological structures of microbial co-occurrence networks with k-mer compositions of OTU sequences and embeds them into a lower-dimensional continuous latent space, while maximally preserving topological relationships among OTUs. The high imbalance among KEGG Orthology (KO) functions of microbes is recognized in our framework that usually yields poor performance. A hierarchical multitask learning module is used in HOPE to alleviate the challenge brought by the long-tailed distribution among classes. To test the performance of HOPE, we compare it with HOPE-one, HOPE-seq, and GraphSAGE, respectively, in three microbial metagenomic 16s rRNA sequencing datasets, including abalone gut, human gut, and gut of Penaeus monodon. Experiments demonstrate that HOPE outperforms baselines on almost all indexes in all experiments. Furthermore, HOPE reveals significant generalization ability. HOPE's basic idea is suitable for other related scenarios, such as the prediction of gene function based on gene co-expression networks. The source code of HOPE is freely available at https://github.com/adrift00/HOPE.

2021 ◽  
Vol 12 ◽  
Author(s):  
Marc Crampon ◽  
Coralie Soulier ◽  
Pauline Sidoli ◽  
Jennifer Hellal ◽  
Catherine Joulian ◽  
...  

The demand for energy and chemicals is constantly growing, leading to an increase of the amounts of contaminants discharged to the environment. Among these, pharmaceutical molecules are frequently found in treated wastewater that is discharged into superficial waters. Indeed, wastewater treatment plants (WWTPs) are designed to remove organic pollution from urban effluents but are not specific, especially toward contaminants of emerging concern (CECs), which finally reach the natural environment. In this context, it is important to study the fate of micropollutants, especially in a soil aquifer treatment (SAT) context for water from WWTPs, and for the most persistent molecules such as benzodiazepines. In the present study, soils sampled in a reed bed frequently flooded by water from a WWTP were spiked with diazepam and oxazepam in microcosms, and their concentrations were monitored for 97 days. It appeared that the two molecules were completely degraded after 15 days of incubation. Samples were collected during the experiment in order to follow the dynamics of the microbial communities, based on 16S rRNA gene sequencing for Archaea and Bacteria, and ITS2 gene for Fungi. The evolution of diversity and of specific operating taxonomic units (OTUs) highlighted an impact of the addition of benzodiazepines, a rapid resilience of the fungal community and an evolution of the bacterial community. It appeared that OTUs from the Brevibacillus genus were more abundant at the beginning of the biodegradation process, for diazepam and oxazepam conditions. Additionally, Tax4Fun tool was applied to 16S rRNA gene sequencing data to infer on the evolution of specific metabolic functions during biodegradation. It finally appeared that the microbial community in soils frequently exposed to water from WWTP, potentially containing CECs such as diazepam and oxazepam, may be adapted to the degradation of persistent contaminants.


2020 ◽  
Vol 3 (1) ◽  
Author(s):  
Stephanie D. Jurburg ◽  
Maximilian Konzack ◽  
Nico Eisenhauer ◽  
Anna Heintz-Buschart

AbstractAs DNA sequencing has become more popular, the public genetic repositories where sequences are archived have experienced explosive growth. These repositories now hold invaluable collections of sequences, e.g., for microbial ecology, but whether these data are reusable has not been evaluated. We assessed the availability and state of 16S rRNA gene amplicon sequences archived in public genetic repositories (SRA, EBI, and DDJ). We screened 26,927 publications in 17 microbiology journals, identifying 2015 16S rRNA gene sequencing studies. Of these, 7.2% had not made their data public at the time of analysis. Among a subset of 635 studies sequencing the same gene region, 40.3% contained data which was not available or not reusable, and an additional 25.5% contained faults in data formatting or data labeling, creating obstacles for data reuse. Our study reveals gaps in data availability, identifies major contributors to data loss, and offers suggestions for improving data archiving practices.


PLoS ONE ◽  
2021 ◽  
Vol 16 (9) ◽  
pp. e0257471
Author(s):  
Charles Carr ◽  
Hannah Wilcox ◽  
Jeremy P. Burton ◽  
Sharanya Menon ◽  
Kait F. Al ◽  
...  

16S rRNA gene sequencing of DNA extracted from clinically uninfected hip and knee implant samples has revealed polymicrobial populations. However, previous studies assessed 16S rRNA gene sequencing as a technique for the diagnosis of periprosthetic joint infections, leaving the microbiota of presumed aseptic hip and knee implants largely unstudied. These communities of microorganisms might play important roles in aspects of host health, such as aseptic loosening. Therefore, this study sought to characterize the bacterial composition of presumed aseptic joint implant microbiota using next generation 16S rRNA gene sequencing, and it evaluated this method for future investigations. 248 samples were collected from implants of 41 patients undergoing total hip or knee arthroplasty revision for presumed aseptic failure. DNA was extracted using two methodologies—one optimized for high throughput and the other for human samples—and amplicons of the V4 region of the 16S rRNA gene were sequenced. Sequencing data were analyzed and compared with ancillary specific PCR and microbiological culture. Computational tools (SourceTracker and decontam) were used to detect and compensate for environmental and processing contaminants. Microbial diversity of patient samples was higher than that of open-air controls and differentially abundant taxa were detected between these conditions, possibly reflecting a true microbiota that is present in clinically uninfected joint implants. However, positive control-associated artifacts and DNA extraction methodology significantly affected sequencing results. As well, sequencing failed to identify Cutibacterium acnes in most culture- and PCR-positive samples. These challenges limited characterization of bacteria in presumed aseptic implants, but genera were identified for further investigation. In all, we provide further support for the hypothesis that there is likely a microbiota present in clinically uninfected joint implants, and we show that methods other than 16S rRNA gene sequencing may be ideal for its characterization. This work has illuminated the importance of further study of microbiota of clinically uninfected joint implants with novel molecular and computational tools to further eliminate contaminants and artifacts that arise in low bacterial abundance samples.


2020 ◽  
Author(s):  
Carter Hoffman ◽  
Nazema Y Siddiqui ◽  
Ian Fields ◽  
W. Thomas Gregory ◽  
Holly Simon ◽  
...  

AbstractThe human bladder contains bacteria in the absence of infection. Interest in studying these bacteria and their association with bladder conditions is increasing, but the chosen experimental method can limit the resolution of the taxonomy that can be assigned to the bacteria found in the bladder. 16S rRNA gene sequencing is commonly used to identify bacteria, but is typically restricted to genus-level identification. Our primary aim was to determine if accurate species-level identification of bladder bacteria is possible using 16S rRNA gene sequencing. We evaluated the ability of different classification schemes, each consisting of combinations of a 16S rRNA gene variable region, a reference database, and a taxonomic classification algorithm to correctly classify bladder bacteria. We show that species-level identification is possible, and that the reference database chosen is the most important component, followed by the 16S variable region sequenced.ImportanceSpecies-level information may deepen our understanding of associations between bladder microbiota and bladder conditions, such as lower urinary tract symptoms and urinary tract infections. The capability to identify bacterial species depends on large databases of sequences, algorithms that leverage statistics and available computer hardware, and knowledge of bacterial genetics and classification. Taken together, this is a daunting body of knowledge to become familiar with before the simple question of bacterial identity can be answered. Our results show the choice of taxonomic database and variable region of the 16S rRNA gene sequence makes species level identification possible. We also show this improvement can be achieved through the more careful application of existing methods and use of existing resources.


2020 ◽  
Vol 15 (1) ◽  
pp. 228-244
Author(s):  
Tatyana Zamkovaya ◽  
Jamie S. Foster ◽  
Valérie de Crécy-Lagard ◽  
Ana Conesa

AbstractMicrobes compose most of the biomass on the planet, yet the majority of taxa remain uncharacterized. These unknown microbes, often referred to as “microbial dark matter,” represent a major challenge for biology. To understand the ecological contributions of these Unknown taxa, it is essential to first understand the relationship between unknown species, neighboring microbes, and their respective environment. Here, we establish a method to study the ecological significance of “microbial dark matter” by building microbial co-occurrence networks from publicly available 16S rRNA gene sequencing data of four extreme aquatic habitats. For each environment, we constructed networks including and excluding unknown organisms at multiple taxonomic levels and used network centrality measures to quantitatively compare networks. When the Unknown taxa were excluded from the networks, a significant reduction in degree and betweenness was observed for all environments. Strikingly, Unknown taxa occurred as top hubs in all environments, suggesting that “microbial dark matter” play necessary ecological roles within their respective communities. In addition, novel adaptation-related genes were detected after using 16S rRNA gene sequences from top-scoring hub taxa as probes to blast metagenome databases. This work demonstrates the broad applicability of network metrics to identify and prioritize key Unknown taxa and improve understanding of ecosystem structure across diverse habitats.


Data in Brief ◽  
2021 ◽  
pp. 107770
Author(s):  
Julia Galeeva ◽  
Vladislav Babenko ◽  
Ramiz Bakhtyev ◽  
Vladimir Baklaushev ◽  
Larisa Balykova ◽  
...  

2021 ◽  
Vol 9 ◽  
Author(s):  
Olivia N. Choi ◽  
Ammon Corl ◽  
Andrew Wolfenden ◽  
Avishai Lublin ◽  
Suzanne L. Ishaq ◽  
...  

Studies in both humans and model organisms suggest that the microbiome may play a significant role in host health, including digestion and immune function. Microbiota can offer protection from exogenous pathogens through colonization resistance, but microbial dysbiosis in the gastrointestinal tract can decrease resistance and is associated with pathogenesis. Little is known about the effects of potential pathogens, such as Salmonella, on the microbiome in wildlife, which are known to play an important role in disease transmission to humans. Culturing techniques have traditionally been used to detect pathogens, but recent studies have utilized high throughput sequencing of the 16S rRNA gene to characterize host-associated microbial communities (i.e., the microbiome) and to detect specific bacteria. Building upon this work, we evaluated the utility of high throughput 16S rRNA gene sequencing for potential bacterial pathogen detection in barn swallows (Hirundo rustica) and used these data to explore relationships between potential pathogens and microbiota. To accomplish this, we first compared the detection of Salmonella spp. in swallows using 16S rRNA data with standard culture techniques. Second, we examined the prevalence of Salmonella using 16S rRNA data and examined the relationship between Salmonella-presence or -absence and individual host factors. Lastly, we evaluated host-associated bacterial diversity and community composition in Salmonella-present vs. -absent birds. Out of 108 samples, we detected Salmonella in six (5.6%) samples based on culture, 25 (23.1%) samples with unrarefied 16S rRNA gene sequencing data, and three (2.8%) samples with both techniques. We found that sex, migratory status, and weight were correlated with Salmonella presence in swallows. In addition, bacterial community composition and diversity differed between birds based on Salmonella status. This study highlights the value of 16S rRNA gene sequencing data for monitoring pathogens in wild birds and investigating the ecology of host microbe-pathogen relationships, data which are important for prediction and mitigation of disease spillover into domestic animals and humans.


Sign in / Sign up

Export Citation Format

Share Document