scholarly journals Bioconductor workflow for microbiome data analysis: from raw reads to community analyses

F1000Research ◽  
2016 ◽  
Vol 5 ◽  
pp. 1492 ◽  
Author(s):  
Ben J. Callahan ◽  
Kris Sankaran ◽  
Julia A. Fukuyama ◽  
Paul J. McMurdie ◽  
Susan P. Holmes

High-throughput sequencing of PCR-amplified taxonomic markers (like the 16S rRNA gene) has enabled a new level of analysis of complex bacterial communities known as microbiomes. Many tools exist to quantify and compare abundance levels or microbial composition of communities in different conditions. The sequencing reads have to be denoised and assigned to the closest taxa from a reference database. Common approaches use a notion of 97% similarity and normalize the data by subsampling to equalize library sizes. In this paper, we show that statistical models allow more accurate abundance estimates. By providing a complete workflow in R, we enable the user to do sophisticated downstream statistical analyses, including both parameteric and nonparametric methods. We provide examples of using the R packages dada2, phyloseq, DESeq2, ggplot2 and vegan to filter, visualize and test microbiome data. We also provide examples of supervised analyses using random forests, partial least squares and linear models as well as nonparametric testing using community networks and the ggnetwork package.

F1000Research ◽  
2016 ◽  
Vol 5 ◽  
pp. 1492 ◽  
Author(s):  
Ben J. Callahan ◽  
Kris Sankaran ◽  
Julia A. Fukuyama ◽  
Paul J. McMurdie ◽  
Susan P. Holmes

High-throughput sequencing of PCR-amplified taxonomic markers (like the 16S rRNA gene) has enabled a new level of analysis of complex bacterial communities known as microbiomes. Many tools exist to quantify and compare abundance levels or OTU composition of communities in different conditions. The sequencing reads have to be denoised and assigned to the closest taxa from a reference database. Common approaches use a notion of 97% similarity and normalize the data by subsampling to equalize library sizes. In this paper, we show that statistical models allow more accurate abundance estimates. By providing a complete workflow in R, we enable the user to do sophisticated downstream statistical analyses, whether parametric or nonparametric. We provide examples of using the R packages dada2, phyloseq, DESeq2, ggplot2 and vegan to filter, visualize and test microbiome data. We also provide examples of supervised analyses using random forests and nonparametric testing using community networks and the ggnetwork package.


2021 ◽  
Author(s):  
Elianne Egge ◽  
Stephanie Elferink ◽  
Daniel Vaulot ◽  
Uwe John ◽  
Gunnar Bratbak ◽  
...  

AbstractArctic marine protist communities have been understudied due to challenging sampling conditions, in particular during winter and in deep waters. The aim of this study was to improve our knowledge on Arctic protist diversity through the year, both in the epipelagic (< 200 m depth) and mesopelagic zones (200-1000 m depth). Sampling campaigns were performed in 2014, during five different months, to capture the various phases of the Arctic primary production: January (winter), March (pre-bloom), May (spring bloom), August (post-bloom) and November (early winter). The cruises were undertaken west and north of the Svalbard archipelago, where warmer Atlantic waters from the West Spitsbergen Current meets cold Arctic waters from the Arctic Ocean. From each cruise, station, and depth, 50 L of sea water were collected and the plankton was size-fractionated by serial filtration into four size fractions between 0.45-200 µm, representing the picoplankton, nanoplankton and microplankton. In addition vertical net hauls were taken from 50 m depth to the surface at selected stations. From the plankton samples DNA was extracted, the V4 region of the 18S rRNA-gene was amplified by PCR with universal eukaryote primers and the amplicons were sequenced by Illumina high-throughput sequencing. Sequences were clustered into Amplicon Sequence Variants (ASVs), representing protist genotypes, with the dada2 pipeline. Taxonomic classification was made against the curated Protist Ribosomal Reference database (PR2). Altogether 6,536 protist ASVs were obtained (including 54 fungal ASVs). Both ASV richness and taxonomic composition were strongly dependent on size-fraction, season, and depth. ASV richness was generally higher in the smaller fractions, and higher in winter and the mesopelagic samples than in samples from the well-lit epipelagic zone during summer. During spring and summer, the phytoplankton groups diatoms, chlorophytes and haptophytes dominated in the epipelagic zone. Parasitic and heterotrophic groups such as Syndiniales and certain dinoflagel-lates dominated in the mesopelagic zone all year, as well as in the epipelagic zone during the winter. The dataset is available at https://doi.org/10.17882/79823, (Egge et al., 2014).


mBio ◽  
2013 ◽  
Vol 4 (1) ◽  
Author(s):  
Caitriona M. Guinane ◽  
Amany Tadrous ◽  
Fiona Fouhy ◽  
C. Anthony Ryan ◽  
Eugene M. Dempsey ◽  
...  

ABSTRACT The human appendix has historically been considered a vestige of evolutionary development with an unknown function. While limited data are available on the microbial composition of the appendix, it has been postulated that this organ could serve as a microbial reservoir for repopulating the gastrointestinal tract in times of necessity. We aimed to explore the microbial composition of the human appendix, using high-throughput sequencing of the 16S rRNA gene V4 region. Seven patients, 5 to 25 years of age, presenting with symptoms of acute appendicitis were included in this study. Results showed considerable diversity and interindividual variability among the microbial composition of the appendix samples. In general, however, Firmicutes was the dominant phylum, with the majority of additional sequences being assigned at various levels to Proteobacteria, Bacteroidetes, Actinobacteria, and Fusobacteria. Despite the large diversity in the microbiota found within the appendix, however, a few major families and genera were found to comprise the majority of the sequences present. Interestingly, also, certain taxa not generally associated with the human intestine, including the oral pathogens Gemella, Parvimonas, and Fusobacterium, were identified among the appendix samples. The prevalence of genera such as Fusobacterium could also be linked to the severity of inflammation of the organ. We conclude that the human appendix contains a robust and varied microbiota distinct from the microbiotas in other niches within the human microbiome. The microbial composition of the human appendix is subject to extreme variability and comprises a diversity of biota that may play an important, as-yet-unknown role in human health. IMPORTANCE There are currently limited data available on the microbial composition of the human appendix. It has been suggested, however, that it may serve as a “safe house” for commensal bacteria that can reinoculate the gut at need. The present study is the first comprehensive view of the microbial composition of the appendix as determined by high-throughput sequencing. We have determined that the human appendix contains a wealth of microbes, including members of 15 phyla. Important information regarding the associated bacterial diversity of the appendix which will help determine the role, if any, the appendix microbiota has in human health is presented.


2021 ◽  
Author(s):  
Katie Bull ◽  
Gareth Davies ◽  
Timothy Patrick Jenkins ◽  
Laura Elizabeth Peachey

Abstract BackgroundChanges to the gut microbiota are associated with an increased incidence of disease in many species. This is particularly important during the process of domestication, where captive animals commonly suffer from gastrointestinal (GI) pathology. Horses are a prime example of a species which suffers from a high incidence of (often life-threatening) GI diseases in domesticated environments. We aimed to indentify the gut microbial changes which occur due to domestication in horses by profiling the faecal microbiota of adult female Exmoor ponies under three management conditions, representing increasing levels of domestication.MethodsFaecal samples were collected from 29 adult female Exmoor ponies in the South West of the UK; ponies were categorised as Feral (n=10), Semi-Feral (n=10) and Domesticated (n=9), based on their management conditions; thus controlling for age, gender and random effects between groups. Diet and medication were recorded and faecal samples taken to assess parasite infection. Faecal microbial composition was profiled via high-throughput sequencing of the bacterial 16S rRNA gene.ResultsDownstream biostatistical analysis indicated profound step-wise changes in global microbial community structure in the transition from Feral to Semi-Feral to Domesticated groups. A relatively high abundance of members of the phylum Proteobacteria and Tenericutes were associated with the Domesticated group; and higher levels of Methanobacteria were seen in the Feral group. The Semi-Feral group frequently had intermediate levels of these taxa; however, they also exhibited the greatest ‘within group’ variation in bacterial diversity and parasites burdens. Functional predictions revealed increased amino acid and lipid metabolism in the Domesticated group and increased energy metabolism in the Feral group; supporting a hypothesis that differences in diet was the key driver of gut microbial composition. ConclusionsIf assumed the Feral population has a more natural gut microbial phenotype, akin to that with which horses have evolved, these data can potentially be used to provide microbial signitures of balanced gut homeostasis in horses; which, in turn, will aid prevention of GI disease in domesticated horses.


2020 ◽  
Author(s):  
Quy Xuan Cao ◽  
Xinxin Sun ◽  
Karun Rajesh ◽  
Naga Chalasani ◽  
Kayla Gelow ◽  
...  

Abstract Background: Accuracy of microbial community detection in 16S rRNA marker-gene and metagenomic studies suffers from contamination and sequencing errors that lead to either falsely identifying microbial taxa that were not in the sample or misclassifying the taxa of DNA fragment reads. Filtering is defined as removing taxa that are present in a small number of samples and have small counts in the samples where they are observed. This approach reduces extreme sparsity of microbiome data and has been shown to correctly remove contaminant taxa in cultured "mock" datasets, where the true taxa compositions are known. Although filtering is frequently used, careful evaluation of its effect on the data analysis and scientific conclusions remains unreported. Here, we assess the effect of filtering on the alpha and beta diversity estimation, as well as its impact on identifying taxa that discriminate between disease states. Results: The effect of filtering on microbiome data analysis is illustrated on four datasets: two mock quality control datasets where same cultured samples with known microbial composition are processed at different labs and two disease study datasets. Results show that in microbiome quality control datasets, filtering reduces the magnitude of differences in alpha diversity and alleviates technical variability between labs, while preserving between samples similarity (beta diversity). In the disease study datasets, DESeq2 and linear discriminant analysis Effect Size (LEfSe) methods were used to identify taxa that are differentially expressed across groups of samples, and random forest models to rank features with largest contribution towards disease classiffcation. Results reveal that filtering retains significant taxa and preserves the model classification ability measured by the area under the receiver operating characteristic curve (AUC). The comparison between filtering and contaminant removal method shows that they have complementary effects and are advised to be used in conjunction. Conclusions: Filtering reduces the complexity of microbiome data, while preserving their integrity in downstream analysis. This leads to mitigation of the classification methods' sensitivity and reduction of technical variability, allowing researchers to generate more reproducible and comparable results in microbiome data analysis.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Lisa Karstens ◽  
Nazema Y. Siddiqui ◽  
Tamara Zaza ◽  
Alecsander Barstad ◽  
Cindy L. Amundsen ◽  
...  

AbstractThe urinary microbiome has been increasingly characterized using next-generation sequencing. However, many of the technical methods have not yet been specifically optimized for urine. We sought to compare the performance of several DNA isolation kits used in urinary microbiome studies. A total of 11 voided urine samples and one buffer control were divided into 5 equal aliquots and processed in parallel using five commercial DNA isolation kits. DNA was quantified and the V4 segment of the 16S rRNA gene was sequenced. Data were processed to identify the microbial composition and to assess alpha and beta diversity of the samples. Tested DNA isolation kits result in significantly different DNA yields from urine samples. DNA extracted with the Qiagen Biostic Bacteremia and DNeasy Blood & Tissue kits showed the fewest technical issues in downstream analyses, with the DNeasy Blood & Tissue kit also demonstrating the highest DNA yield. Nevertheless, all five kits provided good quality DNA for high throughput sequencing with non-significant differences in the number of reads recovered, alpha, or beta diversity.


2021 ◽  
Author(s):  
Zequn Sun ◽  
Jing Zhao ◽  
Zhaoqian Liu ◽  
Qin Ma ◽  
Dongjun Chung

AbstractIdentification of disease-associated microbial species is of great biological and clinical interest. However, this investigation still remains challenges due to heterogeneity in microbial composition between individuals, data quality issues, and complex relationships among species. In this paper, we propose a novel data purification algorithm that allows elimination of noise observations, which leads to increased statistical power to detect disease-associated microbial species. We illustrate the proposed algorithm using the metagenomic data generated from colorectal cancer patients.


2016 ◽  
Author(s):  
Patrick J Kearns ◽  
Jennifer L Bowen ◽  
Michael F Tlusty

Public aquarium exhibits offer numerous educational opportunities for visitors while touch tank exhibits offer guests the ability to directly interact with marine life. However, despite the popularity of these exhibits, the effect of human interactions on the host-associated microbiome or the habitat microbiome remains unclear. Microbial communities, both host-associated and habitat associated can have great implications for host health and habitat function. To better understand the link between human interactions and the microbiome of a touch tank we used high-throughput sequencing of the 16S rRNA gene to analyze the microbial community on the dorsal and ventral surfaces of cow-nose rays (Rhinoptera bonasus) as well as its environment in a frequently visited touch tank exhibit at the New England Aquarium. Our analyses revealed a distinct microbial community associated with the skin of the ray that had lower diversity than the surrounding habitat. The ray skin was dominated by three orders: Burkholderiales (~55%), Flavobacteriales (~19%) and Pseudomonadales (~12%), suggesting a potentially important role of these taxa in ray health. Further, there was no difference between dorsal and ventral surface of the ray in terms of microbial composition or diversity, and a very low presence of common human-associated microbial taxa (<1.5%). Our results suggest that human contact has a minimal effect on the skin and habitat microbiome of the cow-nose ray and that the ray skin harbors a distinct and lower diversity microbial community than its environment.


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e12562
Author(s):  
Zhiyuan Lu ◽  
Sisi Li ◽  
Hongxia Li ◽  
Zhucheng Wang ◽  
Derong Meng ◽  
...  

Background The composition of the intestinal microbiota plays a significant role in modulating host health. It serves as a sensitive evaluation indicator and has substantial implications in protecting endangered species. Great Bustards are typical farmland-dependent wintering birds that are highly susceptible to the interference of human activities. However, information regarding their gut microbiota remains scarce. Methods To ensure a comprehensive analysis of this crucial data, we collected fecal samples from wild Great Bustards at their wintering habitat for two consecutive years. High-throughput sequencing of the 16S rRNA gene was subsequently applied to characterize their core gut microbiota and determine whether the gut microbial composition was similar or varied interannually. Results The gut microbiota of the Great Bustard was primarily comprised of four phyla: Firmicutes (82.87%), Bacteroidetes (7.98%), Proteobacteria (4.49%), and Actinobacteria (3.67%), accounting for 99.01% of the microbial community in all samples. Further analysis revealed 22 genera of core microbes and several pathogens. Notably, there were no significant differences in the alpha-diversity and beta-diversity between the two sample groups from different years. Conclusions This study provides essential information for assessing the health and developing targeted protective measures of this threatened species.


2021 ◽  
Vol 12 ◽  
Author(s):  
Ananda Y. Bandara ◽  
Dilooshi K. Weerasooriya ◽  
Ryan V. Trexler ◽  
Terrence H. Bell ◽  
Paul D. Esker

The occurrence of high- (H) and low- (L) yielding field sites within a farm is a commonly observed phenomenon in soybean cultivation. Site topography, soil physical and chemical attributes, and soil/root-associated microbial composition can contribute to this phenomenon. In order to better understand the microbial dynamics associated with each site type (H/L), we collected bulk soil (BS), rhizosphere soil (RS), and soybean root (R) samples from historically high and low yield sites across eight Pennsylvania farms at V1 (first trifoliate) and R8 (maturity) soybean growth stages (SGS). We extracted DNA extracted from collected samples and performed high-throughput sequencing of PCR amplicons from both the fungal ITS and prokaryotic 16S rRNA gene regions. Sequences were then grouped into amplicon sequence variants (ASVs) and subjected to network analysis. Based on both ITS and 16S rRNA gene data, a greater network size and edges were observed for all sample types from H-sites compared to L-sites at both SGS. Network analysis suggested that the number of potential microbial interactions/associations were greater in samples from H-sites compared to L-sites. Diversity analyses indicated that site-type was not a main driver of alpha and beta diversity in soybean-associated microbial communities. L-sites contained a greater percentage of fungal phytopathogens (ex: Fusarium, Macrophomina, Septoria), while H-sites contained a greater percentage of mycoparasitic (ex: Trichoderma) and entomopathogenic (ex: Metarhizium) fungal genera. Furthermore, roots from H-sites possessed a greater percentage of Bradyrhizobium and genera known to contain plant growth promoting bacteria (ex: Flavobacterium, Duganella). Overall, our results revealed that there were differences in microbial composition in soil and roots from H- and L-sites across a variety of soybean farms. Based on our findings, we hypothesize that differences in microbial composition could have a causative relationship with observed within-farm variability in soybean yield.


Sign in / Sign up

Export Citation Format

Share Document