scholarly journals Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data

2017 ◽  
Author(s):  
Nicole M. Davis ◽  
Diana M. Proctor ◽  
Susan P. Holmes ◽  
David A. Relman ◽  
Benjamin J. Callahan

AbstractBackgroundThe accuracy of microbial community surveys based on marker-gene and metagenomic sequencing (MGS) suffers from the presence of contaminants — DNA sequences not truly present in the sample. Contaminants come from various sources, including reagents. Appropriate laboratory practices can reduce contamination, but do not eliminate it. Here we introduce decontam (https://github.com/benjjneb/decontam), an open-source R package that implements a statistical classification procedure that identifies contaminants in MGS data based on two widely reproduced patterns: contaminants appear at higher frequencies in low-concentration samples, and are often found in negative controls.Resultsdecontam classified amplicon sequence variants (ASVs) in a human oral dataset consistently with prior microscopic observations of the microbial taxa inhabiting that environment and previous reports of contaminant taxa. In metagenomics and marker-gene measurements of a dilution series, decontam substantially reduced technical variation arising from different sequencing protocols. The application of decontam to two recently published datasets corroborated and extended their conclusions that little evidence existed for an indigenous placenta microbiome, and that some low-frequency taxa seemingly associated with preterm birth were contaminants.Conclusionsdecontam improves the quality of metagenomic and marker-gene sequencing by identifying and removing contaminant DNA sequences. decontam integrates easily with existing MGS workflows, and allows researchers to generate more accurate profiles of microbial communities at little to no additional cost.

Microbiome ◽  
2018 ◽  
Vol 6 (1) ◽  
Author(s):  
Nicole M. Davis ◽  
Diana M. Proctor ◽  
Susan P. Holmes ◽  
David A. Relman ◽  
Benjamin J. Callahan

2019 ◽  
Author(s):  
Michael R. McLaren ◽  
Amy D. Willis ◽  
Benjamin J. Callahan

AbstractMeasurements of biological communities by marker-gene and metagenomic sequencing are biased: The measured relative abundances of taxa or their genes are systematically distorted from their true values because each step in the experimental workflow preferentially detects some taxa over others. Bias can lead to qualitatively incorrect conclusions and makes measurements from different protocols quantitatively incomparable. A rigorous understanding of bias is therefore essential. Here we propose, test, and apply a simple mathematical model of how bias distorts marker-gene and metagenomics measurements: Bias multiplies the true relative abundances within each sample by taxon-and protocol-specific factors that describe the different efficiencies with which taxa are detected by the workflow. Critically, these factors are consistent across samples with different compositions, allowing bias to be estimated and corrected. We validate this model in 16S rRNA gene and shotgun metagenomics data from bacterial communities with defined compositions. We use it to reason about the effects of bias on downstream statistical analyses, finding that analyses based on taxon ratios are less sensitive to bias than analyses based on taxon proportions. Finally, we demonstrate how this model can be used to quantify bias from samples of defined composition, partition bias into steps such as DNA extraction and PCR amplification, and to correct biased measurements. Our model improves on previous models by providing a better fit to experimental data and by providing a composition-independent approach to analyzing, measuring, and correcting bias.


2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Irvan Luhung ◽  
Akira Uchida ◽  
Serene B. Y. Lim ◽  
Nicolas E. Gaultier ◽  
Carmon Kee ◽  
...  

AbstractInvestigation of the microbial ecology of terrestrial, aquatic and atmospheric ecosystems requires specific sampling and analytical technologies, owing to vastly different biomass densities typically encountered. In particular, the ultra-low biomass nature of air presents an inherent analytical challenge that is confounded by temporal fluctuations in community structure. Our ultra-low biomass pipeline advances the field of bioaerosol research by significantly reducing sampling times from days/weeks/months to minutes/hours, while maintaining the ability to perform species-level identification through direct metagenomic sequencing. The study further addresses all experimental factors contributing to analysis outcome, such as amassment, storage and extraction, as well as factors that impact on nucleic acid analysis. Quantity and quality of nucleic acid extracts from each optimisation step are evaluated using fluorometry, qPCR and sequencing. Both metagenomics and marker gene amplification-based (16S and ITS) sequencing are assessed with regard to their taxonomic resolution and inter-comparability. The pipeline is robust across a wide range of climatic settings, ranging from arctic to desert to tropical environments. Ultimately, the pipeline can be adapted to environmental settings, such as dust and surfaces, which also require ultra-low biomass analytics.


eLife ◽  
2019 ◽  
Vol 8 ◽  
Author(s):  
Michael R McLaren ◽  
Amy D Willis ◽  
Benjamin J Callahan

Marker-gene and metagenomic sequencing have profoundly expanded our ability to measure biological communities. But the measurements they provide differ from the truth, often dramatically, because these experiments are biased toward detecting some taxa over others. This experimental bias makes the taxon or gene abundances measured by different protocols quantitatively incomparable and can lead to spurious biological conclusions. We propose a mathematical model for how bias distorts community measurements based on the properties of real experiments. We validate this model with 16S rRNA gene and shotgun metagenomics data from defined bacterial communities. Our model better fits the experimental data despite being simpler than previous models. We illustrate how our model can be used to evaluate protocols, to understand the effect of bias on downstream statistical analyses, and to measure and correct bias given suitable calibration controls. These results illuminate new avenues toward truly quantitative and reproducible metagenomics measurements.


1998 ◽  
Vol 2 ◽  
pp. 115-122
Author(s):  
Donatas Švitra ◽  
Jolanta Janutėnienė

In the practice of processing of metals by cutting it is necessary to overcome the vibration of the cutting tool, the processed detail and units of the machine tool. These vibrations in many cases are an obstacle to increase the productivity and quality of treatment of details on metal-cutting machine tools. Vibration at cutting of metals is a very diverse phenomenon due to both it’s nature and the form of oscillatory motion. The most general classification of vibrations at cutting is a division them into forced vibration and autovibrations. The most difficult to remove and poorly investigated are the autovibrations, i.e. vibrations arising at the absence of external periodic forces. The autovibrations, stipulated by the process of cutting on metalcutting machine are of two types: the low-frequency autovibrations and high-frequency autovibrations. When the low-frequency autovibration there appear, the cutting process ought to be terminated and the cause of the vibrations eliminated. Otherwise, there is a danger of a break of both machine and tool. In the case of high-frequency vibration the machine operates apparently quiently, but the processed surface feature small-sized roughness. The frequency of autovibrations can reach 5000 Hz and more.


Vestnik MEI ◽  
2018 ◽  
Vol 5 (5) ◽  
pp. 120-127
Author(s):  
Mikhail D. Vorobyev ◽  
◽  
Dmitriy N. Yudaev ◽  
Andrey Yu. Zorin ◽  
◽  
...  

2019 ◽  
Vol 104 (1) ◽  
pp. 33-48 ◽  
Author(s):  
Alejandro Zuluaga ◽  
Martin Llano ◽  
Ken Cameron

The subfamily Monsteroideae (Araceae) is the third richest clade in the family, with ca. 369 described species and ca. 700 estimated. It comprises mostly hemiepiphytic or epiphytic plants restricted to the tropics, with three intercontinental disjunctions. Using a dataset representing all 12 genera in Monsteroideae (126 taxa), and five plastid and two nuclear markers, we studied the systematics and historical biogeography of the group. We found high support for the monophyly of the three major clades (Spathiphylleae sister to Heteropsis Kunth and Rhaphidophora Hassk. clades), and for six of the genera within Monsteroideae. However, we found low rates of variation in the DNA sequences used and a lack of molecular markers suitable for species-level phylogenies in the group. We also performed ancestral state reconstruction of some morphological characters traditionally used for genera delimitation. Only seed shape and size, number of seeds, number of locules, and presence of endosperm showed utility in the classification of genera in Monsteroideae. We estimated ancestral ranges using a dispersal-extinction-cladogenesis model as implemented in the R package BioGeoBEARS and found evidence for a Gondwanan origin of the clade. One tropical disjunction (Monstera Adans. sister to Amydrium Schott–Epipremnum Schott) was found to be the product of a previous Boreotropical distribution. Two other disjunctions are more recent and likely due to long-distance dispersal: Spathiphyllum Schott (with Holochlamys Engl. nested within) represents a dispersal from South America to the Pacific Islands in Southeast Asia, and Rhaphidophora represents a dispersal from Asia to Africa. Future studies based on stronger phylogenetic reconstructions and complete morphological datasets are needed to explore the details of speciation and migration within and among areas in Asia.


2021 ◽  
Vol 11 (4) ◽  
pp. 1943
Author(s):  
Joo-Young Kim ◽  
Ju Yeon Jung ◽  
Da-Hye Kim ◽  
Seohyun Moon ◽  
Won-Hae Lee ◽  
...  

Analytical techniques such as DNA profiling are widely used in various fields, including forensic science, and novel technologies such as direct polymerase chain reaction (PCR) amplification are continuously being developed in order to acquire DNA profiles efficiently. However, non-specific amplification may occur depending on the quality of the crime scene evidence and amplification methods employed. In particular, the ski-slope effect observed in direct PCR amplification has led to inaccurate interpretations of the DNA profile results. In this study, we aimed to reduce the ski-slope effect by using dimethyl sulfoxide (DMSO) in direct PCR. We confirmed that DMSO (3.75%, v/v) increased the amplification yield of large-sized DNA sequences more than that of small-sized ones. Using 50 Korean buccal samples, we further demonstrated that DMSO reduced the ski-slope effect in direct PCR. These results suggest that the experimental method developed in this study is suitable for direct PCR and may help to successfully obtain DNA profiles from various types of evidence at crime scenes.


Author(s):  
Bennett J Kapili ◽  
Anne E Dekas

Abstract Motivation Linking microbial community members to their ecological functions is a central goal of environmental microbiology. When assigned taxonomy, amplicon sequences of metabolic marker genes can suggest such links, thereby offering an overview of the phylogenetic structure underpinning particular ecosystem functions. However, inferring microbial taxonomy from metabolic marker gene sequences remains a challenge, particularly for the frequently sequenced nitrogen fixation marker gene, nitrogenase reductase (nifH). Horizontal gene transfer in recent nifH evolutionary history can confound taxonomic inferences drawn from the pairwise identity methods used in existing software. Other methods for inferring taxonomy are not standardized and require manual inspection that is difficult to scale. Results We present Phylogenetic Placement for Inferring Taxonomy (PPIT), an R package that infers microbial taxonomy from nifH amplicons using both phylogenetic and sequence identity approaches. After users place query sequences on a reference nifH gene tree provided by PPIT (n = 6317 full-length nifH sequences), PPIT searches the phylogenetic neighborhood of each query sequence and attempts to infer microbial taxonomy. An inference is drawn only if references in the phylogenetic neighborhood are: (1) taxonomically consistent and (2) share sufficient pairwise identity with the query, thereby avoiding erroneous inferences due to known horizontal gene transfer events. We find that PPIT returns a higher proportion of correct taxonomic inferences than BLAST-based approaches at the cost of fewer total inferences. We demonstrate PPIT on deep-sea sediment and find that Deltaproteobacteria are the most abundant potential diazotrophs. Using this dataset we show that emending PPIT inferences based on visual inspection of query sequence placement can achieve taxonomic inferences for nearly all sequences in a query set. We additionally discuss how users can apply PPIT to the analysis of other marker genes. Availability PPIT is freely available to non-commercial users at https://github.com/bkapili/ppit. Installation includes a vignette that demonstrates package use and reproduces the nifH amplicon analysis discussed here. The raw nifH amplicon sequence data have been deposited in the GenBank, EMBL, and DDBJ databases under BioProject number PRJEB37167. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document