scholarly journals cDNA-detector: Detection and removal of cDNA contamination in DNA sequencing libraries

2021 ◽  
Author(s):  
Meifang Qi ◽  
Utthara Nayar ◽  
Leif Ludwig ◽  
Nikhil Wagle ◽  
Esther Rheinbay

Exogenous cDNA introduced into an experimental system, either intentionally or accidentally, can appear as added read coverage over that gene in next-generation sequencing libraries derived from this system. If not properly recognized and managed, this cross-contamination with exogenous signal can lead to incorrect interpretation of research results. Yet, this problem is not routinely addressed in current sequence processing pipelines. Here, we present cDNA-detector, a computational tool to identify and remove exogenous cDNA contamination in DNA sequencing experiments. We apply cDNA-detector to several highly-cited public databases (TCGA, ENCODE, NCBI SRA) and show that contaminant genes appear in sequencing experiments where they lead to incorrect coverage peak calls. Our findings highlight the importance of sensitive detection and removal of contaminant cDNA from NGS libraries before downstream analysis.

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Meifang Qi ◽  
Utthara Nayar ◽  
Leif S. Ludwig ◽  
Nikhil Wagle ◽  
Esther Rheinbay

Abstract Background Exogenous cDNA introduced into an experimental system, either intentionally or accidentally, can appear as added read coverage over that gene in next-generation sequencing libraries derived from this system. If not properly recognized and managed, this cross-contamination with exogenous signal can lead to incorrect interpretation of research results. Yet, this problem is not routinely addressed in current sequence processing pipelines. Results We present cDNA-detector, a computational tool to identify and remove exogenous cDNA contamination in DNA sequencing experiments. We demonstrate that cDNA-detector can identify cDNAs quickly and accurately from alignment files. A source inference step attempts to separate endogenous cDNAs (retrocopied genes) from potential cloned, exogenous cDNAs. cDNA-detector provides a mechanism to decontaminate the alignment from detected cDNAs. Simulation studies show that cDNA-detector is highly sensitive and specific, outperforming existing tools. We apply cDNA-detector to several highly-cited public databases (TCGA, ENCODE, NCBI SRA) and show that contaminant genes appear in sequencing experiments where they lead to incorrect coverage peak calls. Conclusions cDNA-detector is a user-friendly and accurate tool to detect and remove cDNA detection in NGS libraries. This two-step design reduces the risk of true variant removal since it allows for manual review of candidates. We find that contamination with intentionally and accidentally introduced cDNAs is an underappreciated problem even in widely-used consortium datasets, where it can lead to spurious results. Our findings highlight the importance of sensitive detection and removal of contaminant cDNA from NGS libraries before downstream analysis.


2009 ◽  
Vol 55 (4) ◽  
pp. 641-658 ◽  
Author(s):  
Karl V Voelkerding ◽  
Shale A Dames ◽  
Jacob D Durtschi

Abstract Background: For the past 30 years, the Sanger method has been the dominant approach and gold standard for DNA sequencing. The commercial launch of the first massively parallel pyrosequencing platform in 2005 ushered in the new era of high-throughput genomic analysis now referred to as next-generation sequencing (NGS). Content: This review describes fundamental principles of commercially available NGS platforms. Although the platforms differ in their engineering configurations and sequencing chemistries, they share a technical paradigm in that sequencing of spatially separated, clonally amplified DNA templates or single DNA molecules is performed in a flow cell in a massively parallel manner. Through iterative cycles of polymerase-mediated nucleotide extensions or, in one approach, through successive oligonucleotide ligations, sequence outputs in the range of hundreds of megabases to gigabases are now obtained routinely. Highlighted in this review are the impact of NGS on basic research, bioinformatics considerations, and translation of this technology into clinical diagnostics. Also presented is a view into future technologies, including real-time single-molecule DNA sequencing and nanopore-based sequencing. Summary: In the relatively short time frame since 2005, NGS has fundamentally altered genomics research and allowed investigators to conduct experiments that were previously not technically feasible or affordable. The various technologies that constitute this new paradigm continue to evolve, and further improvements in technology robustness and process streamlining will pave the path for translation into clinical diagnostics.


Author(s):  
Dragana Dudić ◽  
Bojana Banović Đeri ◽  
Vesna Pajić ◽  
Gordana Pavlović-Lažetić

Next Generation Sequencing (NGS) analysis has become a widely used method for studying the structure of DNA and RNA, but complexity of the procedure leads to obtaining error-prone datasets which need to be cleansed in order to avoid misinterpretation of data. We address the usage and proper interpretations of characteristic metrics for RNA sequencing (RNAseq) quality control, implemented in and reported by FastQC, and provide a comprehensive guidance for their assessment in the context of total RNAseq quality control of Illumina raw reads. Additionally, we give recommendations how to adequately perform the quality control preprocessing step of raw total RNAseq Illumina reads according to the obtained results of the quality control evaluation step; the aim is to provide the best dataset to downstream analysis, rather than to get better FastQC results. We also tested effects of different preprocessing approaches to the downstream analysis and recommended the most suitable approach.


2011 ◽  
Vol 152 (2) ◽  
pp. 55-62 ◽  
Author(s):  
Zsuzsanna Mihály ◽  
Balázs Győrffy

In the past ten years the development of next generation sequencing technologies brought a new era in the field of quick and efficient DNA sequencing. In our study we give an overview of the methodological achievements from Sanger’s chain-termination sequencing in 1975 to those allowing real-time DNA sequencing today. Sequencing methods that utilize clonal amplicons for parallel multistrand sequencing comprise the basics of currently available next generation sequencing techniques. Nowadays next generation sequencing is mainly used for basic research in functional genomics, providing quintessential information in the meta-analyses of data from signal transduction pathways, onthologies, proteomics and metabolomics. Although next generation sequencing is yet sparsely used in clinical practice, cardiology, oncology and epidemiology already show an immense need for the additional knowledge obtained by this new technology. The main barrier of its spread is the lack of standardization of analysis evaluation methods, which obscure objective assessment of the results. Orv. Hetil., 2011, 152, 55–62.


Genes ◽  
2018 ◽  
Vol 9 (10) ◽  
pp. 505
Author(s):  
Manfred Grabherr ◽  
Bozena Kaminska ◽  
Jan Komorowski

The massive increase in computational power over the recent years and wider applicationsof machine learning methods, coincidental or not, were paralleled by remarkable advances inhigh-throughput DNA sequencing technologies.[...]


2013 ◽  
Vol 13 (2) ◽  
pp. 254-268 ◽  
Author(s):  
Brittany L. Hancock‐Hanser ◽  
Amy Frey ◽  
Matthew S. Leslie ◽  
Peter H. Dutton ◽  
Frederick I. Archer ◽  
...  

2020 ◽  
Vol 8 (Suppl 3) ◽  
pp. A61-A61
Author(s):  
Lee McDaniel ◽  
Rachel Pyke ◽  
Charles Abbott ◽  
Gabor Bartha ◽  
John West ◽  
...  

BackgroundPrecision immuno-oncology is increasingly relevant to cancer therapy given the ascendance of immunotherapy. While next-generation sequencing (NGS) based algorithms may elucidate immunotherapeutic response, many such algorithms require highly accurate Class I HLA typing. One major challenge of HLA type derivation resides in highly polymorphic HLA allelic diversity, which conventional exome sequencing technologies poorly capture. Further, accurate HLA typing requires definitive distinction between thousands of potential HLA alleles. These challenges may cause widely used NGS HLA typing tools, such as Polysolver and Optitype, to perform inaccurate HLA typing. Poor HLA coverage poses the risk of silently mistyping HLA alleles, yielding inaccurate downstream HLA loss of heterozygosity (LOH) detection and neoepitope predictions.MethodsWe designed the ImmunoID NeXT Platform® to more comprehensively profile the HLA region. To evaluate the accuracy of conventional NGS-based Class I HLA typing, a widely used dbGaP project (phs000452, n=160) of melanoma NGS data was evaluated alongside a set of over 500 solid tumor cancer patient samples sequenced on the ImmunoID NeXT Platform. Read coverage was derived from both GRCh38 and HLA allele database alignments. To test whether Polysolver over represents specific HLA alleles under reduced read conditions, a Monte Carlo bootstrap approach predicted theoretical allele frequency ranges.ResultsBelow 20x read coverage, nearly 50% of Polysolver HLA calls (phs000452) are homozygous, representing a divergence from typical HLA homozygous rates of between 10–20%, with p<10-15 (Fisher’s Exact) compared to reference 1000 Genomes homozygous rates. Polysolver’s homozygous, heterozygous, and no-calls demonstrated a statistically significant difference in coverage (p<10-6, Kruskal-Wallis) across all Class I HLA genes per Polysolver and public exome data (phs000452). The Personalis ImmunoID NeXT™ cohort did not demonstrate such a trend despite a similar exome-wide sequencing depth. Further, sixteen rare HLA alleles were identified with sample frequencies greater than expected from the dbGaP data set, with no such alleles identified from the Personalis ImmunoID NeXT data set.ConclusionsHLA typing may silently fail in the context of reduced read coverage without HLA-specific platform augmentation. This silent failure can have large implications for accurate neoantigen prediction and HLA LOH detection, both of which are becoming increasingly important for immuno-oncology treatment modalities such as personalized cancer vaccines, adoptive cell therapies, and blockade therapy response biomarkers. Studies utilizing neoepitope and HLA LOH prediction require careful validation for HLA calls, including assessments of coverage and homozygous rates, and may benefit from increased HLA locus coverage.


2021 ◽  
Vol 39 (9) ◽  
pp. 1129-1140
Author(s):  
Jonathan Foox ◽  
Scott W. Tighe ◽  
Charles M. Nicolet ◽  
Justin M. Zook ◽  
Marta Byrska-Bishop ◽  
...  

F1000Research ◽  
2012 ◽  
Vol 1 ◽  
pp. 2 ◽  
Author(s):  
Gavin R Oliver

Next-generation sequencing technologies are increasingly being applied in clinical settings, however the data are characterized by a range of platform-specific artifacts making downstream analysis problematic and error- prone. One major application of NGS is in the profiling of clinically relevant mutations whereby sequences are aligned to a reference genome and potential mutations assessed and scored. Accurate sequence alignment is pivotal in reliable assessment of potential mutations however selection of appropriate alignment tools is a non-trivial task complicated by the availability of multiple solutions each with its own performance characteristics. Using targeted analysis of BRCA1 as an example, we have simulated and mutated a test dataset based on Illumina sequencing technology. Our findings reveal key differences in the abilities of a range of common commercial and open source alignment tools to facilitate accurate downstream detection of a range of mutations. These observations will be of importance to anyone using NGS to profile mutations in clinical or basic research.


Sign in / Sign up

Export Citation Format

Share Document