scholarly journals Competitive mapping allows for the identification and exclusion of human DNA contamination in ancient faunal genomic datasets

BMC Genomics ◽  
2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Tatiana R. Feuerborn ◽  
Eleftheria Palkopoulou ◽  
Tom van der Valk ◽  
Johanna von Seth ◽  
Arielle R. Munters ◽  
...  

Abstract Background After over a decade of developments in field collection, laboratory methods and advances in high-throughput sequencing, contamination remains a key issue in ancient DNA research. Currently, human and microbial contaminant DNA still impose challenges on cost-effective sequencing and accurate interpretation of ancient DNA data. Results Here we investigate whether human contaminating DNA can be found in ancient faunal sequencing datasets. We identify variable levels of human contamination, which persists even after the sequence reads have been mapped to the faunal reference genomes. This contamination has the potential to affect a range of downstream analyses. Conclusions We propose a fast and simple method, based on competitive mapping, which allows identifying and removing human contamination from ancient faunal DNA datasets with limited losses of true ancient data. This method could represent an important tool for the ancient DNA field.

2020 ◽  
Author(s):  
Tatiana R. Feuerborn ◽  
Elle Palkopoulou ◽  
Tom van der Valk ◽  
Johanna von Seth ◽  
Arielle R. Munters ◽  
...  

AbstractBackgroundAfter over a decade of developments in field collection, laboratory methods and advances in high-throughput sequencing, contamination remains a key issue in ancient DNA research. Currently, human and microbial contaminant DNA still impose challenges on cost-effective sequencing and accurate interpretation of ancient DNA data.ResultsHere we investigate whether human contaminating DNA can be found in ancient faunal sequencing datasets. We identify variable levels of human contamination, which persists even after the sequence reads have been mapped to the faunal reference genomes. This contamination has the potential to affect a range of downstream analyses.ConclusionsWe propose a fast and simple method, based on competitive mapping, which allows identifying and removing human contamination from ancient faunal DNA datasets with limited losses of true ancient data. This method could represent an important tool for the ancient DNA field.


2019 ◽  
Vol 47 (2) ◽  
pp. 190 ◽  
Author(s):  
Henrik Krehenwinkel ◽  
Susanne Meese ◽  
Christoph Mayer ◽  
Jasmin Ruch ◽  
Jutta Schneider ◽  
...  

Gene ◽  
2013 ◽  
Vol 528 (2) ◽  
pp. 347-351 ◽  
Author(s):  
Makio Kihana ◽  
Fuzuki Mizuno ◽  
Rikai Sawafuji ◽  
Li Wang ◽  
Shintaroh Ueda

mSphere ◽  
2020 ◽  
Vol 5 (5) ◽  
Author(s):  
Bhavna Hora ◽  
Naila Gulzar ◽  
Yue Chen ◽  
Konstantinos Karagiannis ◽  
Fangping Cai ◽  
...  

ABSTRACT High-throughput sequencing (HTS) has been widely used to characterize HIV-1 genome sequences. There are no algorithms currently that can directly determine genotype and quasispecies population using short HTS reads generated from long genome sequences without additional software. To establish a robust subpopulation, subtype, and recombination analysis workflow, we amplified the HIV-1 3′-half genome from plasma samples of 65 HIV-1-infected individuals and sequenced the entire amplicon (∼4,500 bp) by HTS. With direct analysis of raw reads using HIVE-hexahedron, we showed that 48% of samples harbored 2 to 13 subpopulations. We identified various subtypes (17 A1s, 4 Bs, 27 Cs, 6 CRF02_AGs, and 11 unique recombinant forms) and defined recombinant breakpoints of 10 recombinants. These results were validated with viral genome sequences generated by single genome sequencing (SGS) or the analysis of consensus sequence of the HTS reads. The HIVE-hexahedron workflow is more sensitive and accurate than just evaluating the consensus sequence and also more cost-effective than SGS. IMPORTANCE The highly recombinogenic nature of human immunodeficiency virus type 1 (HIV-1) leads to recombination and emergence of quasispecies. It is important to reliably identify subpopulations to understand the complexity of a viral population for drug resistance surveillance and vaccine development. High-throughput sequencing (HTS) provides improved resolution over Sanger sequencing for the analysis of heterogeneous viral subpopulations. However, current methods of analysis of HTS reads are unable to fully address accurate population reconstruction. Hence, there is a dire need for a more sensitive, accurate, user-friendly, and cost-effective method to analyze viral quasispecies. For this purpose, we have improved the HIVE-hexahedron algorithm that we previously developed with in silico short sequences to analyze raw HTS short reads. The significance of this study is that our standalone algorithm enables a streamlined analysis of quasispecies, subtype, and recombination patterns from long HIV-1 genome regions without the need of additional sequence analysis tools. Distinct viral populations and recombination patterns identified by HIVE-hexahedron are further validated by comparison with sequences obtained by single genome sequencing (SGS).


2019 ◽  
Author(s):  
Lucas A. Nell

AbstractHigh-throughput sequencing (HTS) is central to the study of population genomics and has an increasingly important role in constructing phylogenies. Choices in research design for sequencing projects can include a wide range of factors, such as sequencing platform, depth of coverage, and bioinformatic tools. Simulating HTS data better informs these decisions. However, current standalone HTS simulators cannot generate genomic variants under even somewhat complex evolutionary scenarios, which greatly reduces their usefulness for fields such as population genomics and phylogenomics. Here I present the R package jackalope that simply and efficiently simulates (i) variants from reference genomes and (ii) reads from both Illumina and Pacific Biosciences (PacBio) platforms. Genomic variants can be simulated using phylogenies, gene trees, coalescent-simulation output, population-genomic summary statistics, and Variant Call Format (VCF) files. jackalope can simulate single, paired-end, or mate-pair Illumina reads, as well as reads from Pacific Biosciences. These simulations include sequencing errors, mapping qualities, multiplexing, and optical/PCR duplicates. It can read reference genomes from FASTA files and can simulate new ones, and all outputs can be written to standard file formats. jackalope is available for Mac, Windows, and Linux systems.


2018 ◽  
Author(s):  
Quinn K. Langdon ◽  
David Peris ◽  
Brian Kyle ◽  
Chris Todd Hittinger

AbstractThe genomics era has expanded our knowledge about the diversity of the living world, yet harnessing high-throughput sequencing data to investigate alternative evolutionary trajectories, such as hybridization, is still challenging. Here we present sppIDer, a pipeline for the characterization of interspecies hybrids and pure species,that illuminates the complete composition of genomes. sppIDer maps short-read sequencing data to a combination genome built from reference genomes of several species of interest and assesses the genomic contribution and relative ploidy of each parental species, producing a series of colorful graphical outputs ready for publication. As a proof-of-concept, we use the genus Saccharomyces to detect and visualize both interspecies hybrids and pure strains, even with missing parental reference genomes. Through simulation, we show that sppIDer is robust to variable reference genome qualities and performs well with low-coverage data. We further demonstrate the power of this approach in plants, animals, and other fungi. sppIDer is robust to many different inputs and provides visually intuitive insight into genome composition that enables the rapid identification of species and their interspecies hybrids. sppIDer exists as a Docker image, which is a reusable, reproducible, transparent, and simple-to-run package that automates the pipeline and installation of the required dependencies (https://github.com/GLBRC/sppIDer).


2016 ◽  
Vol 3 (1) ◽  
pp. 1-14 ◽  
Author(s):  
Bastien Llamas ◽  
Guido Valverde ◽  
Lars Fehren-Schmitz ◽  
Laura S Weyrich ◽  
Alan Cooper ◽  
...  

2021 ◽  
Author(s):  
Marc Fuchs ◽  
Clara Radulescu ◽  
Miao Tang ◽  
Arun Mahesh ◽  
Deborah Lavin ◽  
...  

Introduction: The COVID-19 pandemic has highlighted the importance of whole genome sequencing (WGS) of SARS-CoV-2 to inform public health policy. By enabling definition of lineages it facilitates tracking of the global spread of the virus. The evolution of new variants can be monitored and knowledge of specific mutations provides insights into the mechanisms through which the virus increases transmissibility or evades immunity. To date almost one million SARS-CoV-2 genomes have been sequenced by members of the COVID-19 Genomics UK (COG-UK) Consortium. To achieve similar feats in a more cost-effective and sustainable manner in future, improved high throughput virus sequencing protocols are required. We have therefore developed a miniaturized library preparation protocol with drastically reduced consumable use and costs. Methods: SARS-CoV-2 RNA was amplified using the ARTIC nCov-2019 multiplex RT-PCR protocol and purified using a conventional liquid handling system. Acoustic liquid transfer (Echo 525) was employed to reduce reaction volumes and the number of tips required for a Nextera XT library preparation. Sequencing was performed on an Illumina MiSeq. Results: We present the 'Mini-XT' miniaturized tagmentation-based library preparation protocol available on protocols.io (https://dx.doi.org/10.17504/protocols.io.bvntn5en). The final version of Mini-XT has been used to sequence 4,384 SARS-CoV-2 samples from N. Ireland with a COG-UK QC pass rate of 97.4%. Sequencing quality was comparable and lineage calling consistent for replicate samples processed with full volume Nextera DNA Flex (333 samples) or using nanopore technology (20 samples). SNP calling between Mini-XT and these technologies was consistent and sequences from replicate samples paired together in maximum likelihood phylogenetic trees. Conclusion: The Mini-XT protocol maintains sequence quality while reducing library preparation reagent volumes 8-fold and halving overall tip usage from sample to sequence to provide concomitant cost savings relative to standard protocols. This will enable more efficient high-throughput sequencing of SARS-CoV-2 isolates and future pathogen WGS.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Yasemin Guenay-Greunke ◽  
David A. Bohan ◽  
Michael Traugott ◽  
Corinna Wallinger

AbstractHigh-throughput sequencing platforms are increasingly being used for targeted amplicon sequencing because they enable cost-effective sequencing of large sample sets. For meaningful interpretation of targeted amplicon sequencing data and comparison between studies, it is critical that bioinformatic analyses do not introduce artefacts and rely on detailed protocols to ensure that all methods are properly performed and documented. The analysis of large sample sets and the use of predefined indexes create challenges, such as adjusting the sequencing depth across samples and taking sequencing errors or index hopping into account. However, the potential biases these factors introduce to high-throughput amplicon sequencing data sets and how they may be overcome have rarely been addressed. On the example of a nested metabarcoding analysis of 1920 carabid beetle regurgitates to assess plant feeding, we investigated: (i) the variation in sequencing depth of individually tagged samples and the effect of library preparation on the data output; (ii) the influence of sequencing errors within index regions and its consequences for demultiplexing; and (iii) the effect of index hopping. Our results demonstrate that despite library quantification, large variation in read counts and sequencing depth occurred among samples and that the sequencing error rate in bioinformatic software is essential for accurate adapter/primer trimming and demultiplexing. Moreover, setting an index hopping threshold to avoid incorrect assignment of samples is highly recommended.


Sign in / Sign up

Export Citation Format

Share Document