Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data

AbstractBackgroundThe accuracy of microbial community surveys based on marker-gene and metagenomic sequencing (MGS) suffers from the presence of contaminants — DNA sequences not truly present in the sample. Contaminants come from various sources, including reagents. Appropriate laboratory practices can reduce contamination, but do not eliminate it. Here we introduce decontam (https://github.com/benjjneb/decontam), an open-source R package that implements a statistical classification procedure that identifies contaminants in MGS data based on two widely reproduced patterns: contaminants appear at higher frequencies in low-concentration samples, and are often found in negative controls.Resultsdecontam classified amplicon sequence variants (ASVs) in a human oral dataset consistently with prior microscopic observations of the microbial taxa inhabiting that environment and previous reports of contaminant taxa. In metagenomics and marker-gene measurements of a dilution series, decontam substantially reduced technical variation arising from different sequencing protocols. The application of decontam to two recently published datasets corroborated and extended their conclusions that little evidence existed for an indigenous placenta microbiome, and that some low-frequency taxa seemingly associated with preterm birth were contaminants.Conclusionsdecontam improves the quality of metagenomic and marker-gene sequencing by identifying and removing contaminant DNA sequences. decontam integrates easily with existing MGS workflows, and allows researchers to generate more accurate profiles of microbial communities at little to no additional cost.

Download Full-text

Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data

Microbiome ◽

10.1186/s40168-018-0605-2 ◽

2018 ◽

Vol 6 (1) ◽

Cited By ~ 266

Author(s):

Nicole M. Davis ◽

Diana M. Proctor ◽

Susan P. Holmes ◽

David A. Relman ◽

Benjamin J. Callahan

Keyword(s):

Marker Gene ◽

Metagenomics Data ◽

Statistical Identification

Download Full-text

Consistent and correctable bias in metagenomic sequencing experiments

10.1101/559831 ◽

2019 ◽

Cited By ~ 9

Author(s):

Michael R. McLaren ◽

Amy D. Willis ◽

Benjamin J. Callahan

Keyword(s):

Marker Gene ◽

Pcr Amplification ◽

Rrna Gene ◽

Metagenomic Sequencing ◽

Shotgun Metagenomics ◽

Biological Communities ◽

Metagenomics Data ◽

Specific Factors ◽

Relative Abundances ◽

True Values

AbstractMeasurements of biological communities by marker-gene and metagenomic sequencing are biased: The measured relative abundances of taxa or their genes are systematically distorted from their true values because each step in the experimental workflow preferentially detects some taxa over others. Bias can lead to qualitatively incorrect conclusions and makes measurements from different protocols quantitatively incomparable. A rigorous understanding of bias is therefore essential. Here we propose, test, and apply a simple mathematical model of how bias distorts marker-gene and metagenomics measurements: Bias multiplies the true relative abundances within each sample by taxon-and protocol-specific factors that describe the different efficiencies with which taxa are detected by the workflow. Critically, these factors are consistent across samples with different compositions, allowing bias to be estimated and corrected. We validate this model in 16S rRNA gene and shotgun metagenomics data from bacterial communities with defined compositions. We use it to reason about the effects of bias on downstream statistical analyses, finding that analyses based on taxon ratios are less sensitive to bias than analyses based on taxon proportions. Finally, we demonstrate how this model can be used to quantify bias from samples of defined composition, partition bias into steps such as DNA extraction and PCR amplification, and to correct biased measurements. Our model improves on previous models by providing a better fit to experimental data and by providing a composition-independent approach to analyzing, measuring, and correcting bias.

Download Full-text

Experimental parameters defining ultra-low biomass bioaerosol analysis

npj Biofilms and Microbiomes ◽

10.1038/s41522-021-00209-4 ◽

2021 ◽

Vol 7 (1) ◽

Author(s):

Irvan Luhung ◽

Akira Uchida ◽

Serene B. Y. Lim ◽

Nicolas E. Gaultier ◽

Carmon Kee ◽

...

Keyword(s):

Nucleic Acid ◽

Marker Gene ◽

Taxonomic Resolution ◽

Its Sequencing ◽

Metagenomic Sequencing ◽

Tropical Environments ◽

Wide Range ◽

Experimental Parameters ◽

Nucleic Acid Analysis

AbstractInvestigation of the microbial ecology of terrestrial, aquatic and atmospheric ecosystems requires specific sampling and analytical technologies, owing to vastly different biomass densities typically encountered. In particular, the ultra-low biomass nature of air presents an inherent analytical challenge that is confounded by temporal fluctuations in community structure. Our ultra-low biomass pipeline advances the field of bioaerosol research by significantly reducing sampling times from days/weeks/months to minutes/hours, while maintaining the ability to perform species-level identification through direct metagenomic sequencing. The study further addresses all experimental factors contributing to analysis outcome, such as amassment, storage and extraction, as well as factors that impact on nucleic acid analysis. Quantity and quality of nucleic acid extracts from each optimisation step are evaluated using fluorometry, qPCR and sequencing. Both metagenomics and marker gene amplification-based (16S and ITS) sequencing are assessed with regard to their taxonomic resolution and inter-comparability. The pipeline is robust across a wide range of climatic settings, ranging from arctic to desert to tropical environments. Ultimately, the pipeline can be adapted to environmental settings, such as dust and surfaces, which also require ultra-low biomass analytics.

Download Full-text

Consistent and correctable bias in metagenomic sequencing experiments

eLife ◽

10.7554/elife.46923 ◽

2019 ◽

Vol 8 ◽

Cited By ~ 42

Author(s):

Michael R McLaren ◽

Amy D Willis ◽

Benjamin J Callahan

Keyword(s):

Experimental Data ◽

Bacterial Communities ◽

Marker Gene ◽

Rrna Gene ◽

Metagenomic Sequencing ◽

Shotgun Metagenomics ◽

Biological Communities ◽

Experimental Bias ◽

Metagenomics Data ◽

Or Gene

Marker-gene and metagenomic sequencing have profoundly expanded our ability to measure biological communities. But the measurements they provide differ from the truth, often dramatically, because these experiments are biased toward detecting some taxa over others. This experimental bias makes the taxon or gene abundances measured by different protocols quantitatively incomparable and can lead to spurious biological conclusions. We propose a mathematical model for how bias distorts community measurements based on the properties of real experiments. We validate this model with 16S rRNA gene and shotgun metagenomics data from defined bacterial communities. Our model better fits the experimental data despite being simpler than previous models. We illustrate how our model can be used to evaluate protocols, to understand the effect of bias on downstream statistical analyses, and to measure and correct bias given suitable calibration controls. These results illuminate new avenues toward truly quantitative and reproducible metagenomics measurements.

Download Full-text

Dynamics of the processes in metal machining

Nonlinear Analysis Modelling and Control ◽

10.15388/na.1998.2.0.15295 ◽

1998 ◽

Vol 2 ◽

pp. 115-122

Author(s):

Donatas Švitra ◽

Jolanta Janutėnienė

Keyword(s):

High Frequency ◽

Machine Tools ◽

Metal Cutting ◽

Cutting Tool ◽

Low Frequency ◽

Metal Cutting Machine ◽

Cutting Machine ◽

Metal Machining

In the practice of processing of metals by cutting it is necessary to overcome the vibration of the cutting tool, the processed detail and units of the machine tool. These vibrations in many cases are an obstacle to increase the productivity and quality of treatment of details on metal-cutting machine tools. Vibration at cutting of metals is a very diverse phenomenon due to both it’s nature and the form of oscillatory motion. The most general classification of vibrations at cutting is a division them into forced vibration and autovibrations. The most difficult to remove and poorly investigated are the autovibrations, i.e. vibrations arising at the absence of external periodic forces. The autovibrations, stipulated by the process of cutting on metalcutting machine are of two types: the low-frequency autovibrations and high-frequency autovibrations. When the low-frequency autovibration there appear, the cutting process ought to be terminated and the cause of the vibrations eliminated. Otherwise, there is a danger of a break of both machine and tool. In the case of high-frequency vibration the machine operates apparently quiently, but the processed surface feature small-sized roughness. The frequency of autovibrations can reach 5000 Hz and more.

Download Full-text

Estimating the Quality of Thermionic Cathodes for Microwave Vacuum Tubes Using the Low-Frequency Noise Parameters

Vestnik MEI ◽

10.24160/1993-6982-2018-5-120-127 ◽

2018 ◽

Vol 5 (5) ◽

pp. 120-127

Author(s):

Mikhail D. Vorobyev ◽

◽

Dmitriy N. Yudaev ◽

Andrey Yu. Zorin ◽

◽

...

Keyword(s):

Low Frequency ◽

Frequency Noise ◽

Low Frequency Noise ◽

Noise Parameters ◽

Thermionic Cathodes ◽

Vacuum Tubes

Download Full-text

Systematics, Biogeography, and Morphological Character Evolution of the Hemiepiphytic Subfamily Monsteroideae (Araceae)

Annals of the Missouri Botanical Garden ◽

10.3417/2018269 ◽

2019 ◽

Vol 104 (1) ◽

pp. 33-48 ◽

Cited By ~ 4

Author(s):

Alejandro Zuluaga ◽

Martin Llano ◽

Ken Cameron

Keyword(s):

Dna Sequences ◽

Pacific Islands ◽

R Package ◽

Morphological Characters ◽

Ancestral State ◽

Seed Shape ◽

Long Distance ◽

The Pacific ◽

And Migration ◽

The Tropics

The subfamily Monsteroideae (Araceae) is the third richest clade in the family, with ca. 369 described species and ca. 700 estimated. It comprises mostly hemiepiphytic or epiphytic plants restricted to the tropics, with three intercontinental disjunctions. Using a dataset representing all 12 genera in Monsteroideae (126 taxa), and five plastid and two nuclear markers, we studied the systematics and historical biogeography of the group. We found high support for the monophyly of the three major clades (Spathiphylleae sister to Heteropsis Kunth and Rhaphidophora Hassk. clades), and for six of the genera within Monsteroideae. However, we found low rates of variation in the DNA sequences used and a lack of molecular markers suitable for species-level phylogenies in the group. We also performed ancestral state reconstruction of some morphological characters traditionally used for genera delimitation. Only seed shape and size, number of seeds, number of locules, and presence of endosperm showed utility in the classification of genera in Monsteroideae. We estimated ancestral ranges using a dispersal-extinction-cladogenesis model as implemented in the R package BioGeoBEARS and found evidence for a Gondwanan origin of the clade. One tropical disjunction (Monstera Adans. sister to Amydrium Schott–Epipremnum Schott) was found to be the product of a previous Boreotropical distribution. Two other disjunctions are more recent and likely due to long-distance dispersal: Spathiphyllum Schott (with Holochlamys Engl. nested within) represents a dispersal from South America to the Pacific Islands in Southeast Asia, and Rhaphidophora represents a dispersal from Asia to Africa. Future studies based on stronger phylogenetic reconstructions and complete morphological datasets are needed to explore the details of speciation and migration within and among areas in Asia.

Download Full-text

DMSO Improves the Ski-Slope Effect in Direct PCR

Applied Sciences ◽

10.3390/app11041943 ◽

2021 ◽

Vol 11 (4) ◽

pp. 1943

Author(s):

Joo-Young Kim ◽

Ju Yeon Jung ◽

Da-Hye Kim ◽

Seohyun Moon ◽

Won-Hae Lee ◽

...

Keyword(s):

Dna Sequences ◽

Pcr Amplification ◽

Analytical Techniques ◽

Direct Pcr ◽

Dna Profile ◽

Novel Technologies ◽

Specific Amplification ◽

Polymerase Chain ◽

Dna Profiles

Analytical techniques such as DNA profiling are widely used in various fields, including forensic science, and novel technologies such as direct polymerase chain reaction (PCR) amplification are continuously being developed in order to acquire DNA profiles efficiently. However, non-specific amplification may occur depending on the quality of the crime scene evidence and amplification methods employed. In particular, the ski-slope effect observed in direct PCR amplification has led to inaccurate interpretations of the DNA profile results. In this study, we aimed to reduce the ski-slope effect by using dimethyl sulfoxide (DMSO) in direct PCR. We confirmed that DMSO (3.75%, v/v) increased the amplification yield of large-sized DNA sequences more than that of small-sized ones. Using 50 Korean buccal samples, we further demonstrated that DMSO reduced the ski-slope effect in direct PCR. These results suggest that the experimental method developed in this study is suitable for direct PCR and may help to successfully obtain DNA profiles from various types of evidence at crime scenes.

Download Full-text

PPIT: an R package for inferring microbial taxonomy from nifH sequences

Bioinformatics ◽

10.1093/bioinformatics/btab100 ◽

2021 ◽

Author(s):

Bennett J Kapili ◽

Anne E Dekas

Keyword(s):

Gene Transfer ◽

Horizontal Gene Transfer ◽

Query Sequence ◽

Marker Gene ◽

R Package ◽

Supplementary Information ◽

Marker Genes ◽

Pairwise Identity ◽

Metabolic Marker ◽

Microbial Taxonomy

Abstract Motivation Linking microbial community members to their ecological functions is a central goal of environmental microbiology. When assigned taxonomy, amplicon sequences of metabolic marker genes can suggest such links, thereby offering an overview of the phylogenetic structure underpinning particular ecosystem functions. However, inferring microbial taxonomy from metabolic marker gene sequences remains a challenge, particularly for the frequently sequenced nitrogen fixation marker gene, nitrogenase reductase (nifH). Horizontal gene transfer in recent nifH evolutionary history can confound taxonomic inferences drawn from the pairwise identity methods used in existing software. Other methods for inferring taxonomy are not standardized and require manual inspection that is difficult to scale. Results We present Phylogenetic Placement for Inferring Taxonomy (PPIT), an R package that infers microbial taxonomy from nifH amplicons using both phylogenetic and sequence identity approaches. After users place query sequences on a reference nifH gene tree provided by PPIT (n = 6317 full-length nifH sequences), PPIT searches the phylogenetic neighborhood of each query sequence and attempts to infer microbial taxonomy. An inference is drawn only if references in the phylogenetic neighborhood are: (1) taxonomically consistent and (2) share sufficient pairwise identity with the query, thereby avoiding erroneous inferences due to known horizontal gene transfer events. We find that PPIT returns a higher proportion of correct taxonomic inferences than BLAST-based approaches at the cost of fewer total inferences. We demonstrate PPIT on deep-sea sediment and find that Deltaproteobacteria are the most abundant potential diazotrophs. Using this dataset we show that emending PPIT inferences based on visual inspection of query sequence placement can achieve taxonomic inferences for nearly all sequences in a query set. We additionally discuss how users can apply PPIT to the analysis of other marker genes. Availability PPIT is freely available to non-commercial users at https://github.com/bkapili/ppit. Installation includes a vignette that demonstrates package use and reproduces the nifH amplicon analysis discussed here. The raw nifH amplicon sequence data have been deposited in the GenBank, EMBL, and DDBJ databases under BioProject number PRJEB37167. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text