Controlling for contaminants in low biomass 16S rRNA gene sequencing experiments
AbstractBackgroundMicrobial communities are commonly studied using culture-independent methods such as 16S rRNA gene sequencing. However, one challenge in accurately characterizing microbial communities is exogenous bacterial DNA contamination. This is particularly problematic for sites of low microbial biomass such as the urinary tract, placenta, and lower airway. Computational approaches have been proposed as a post-processing step to identify and remove potential contaminants, but their performance has not been independently evaluated.To identify the impact of decreasing microbial biomass on polymicrobial 16S rRNA gene sequencing experiments, we used a serial dilution of a mock microbial community. We evaluated two computational approaches to identify and remove contaminants: 1) identifying sequences that have an inverse correlation with DNA concentration implemented in Decontam and 2) predicting the proportion of experimental sample arising from defined contaminant sources implemented in SourceTracker.ResultsAs expected, the proportion of contaminant bacterial DNA increased with decreasing starting microbial biomass, with 79.12% of the most dilute sample arising from contaminant sequences. Inclusion of contaminant sequences in analyses leads to overinflated diversity estimates (up to 12 times greater than the expected values) and distorts microbiome composition. SourceTracker successfully removed over 98% of contaminants when the experimental environments are well defined. However, SourceTracker performed poorly when the experimental environment is unknown, failing to remove the majority of contaminants. Decontam successfully removed 74-91% of contaminants regardless of prior knowledge of the experimental environment.ConclusionsOur study indicates that computational methods can reduce the amount of contaminants in 16S rRNA gene sequencing experiments. The appropriate computational approach for removing contaminant sequences from an experiment depends on the prior knowledge about the microbial environment under investigation and can be evaluated with a dilution series of a mock microbial community.