Next Generation Sequencing of Pooled Samples: Guideline for Variants’ Filtering

Abstract Sequencing large number of individuals, which is often needed for population genetics studies, is still economically challenging despite falling costs of Next Generation Sequencing (NGS). Pool-seq is an alternative cost- and time-effective option in which DNA from several individuals is pooled for sequencing. However, pooling of DNA creates new problems and challenges for accurate variant call and allele frequency (AF) estimation. In particular, sequencing errors confound with the alleles present at low frequency in the pools possibly giving rise to false positive variants. We sequenced 996 individuals in 83 pools (12 individuals/pool) in a targeted re-sequencing experiment. We show that Pool-seq AFs are robust and reliable by comparing them with public variant databases and in-house SNP-genotyping data of individual subjects of pools. Furthermore, we propose a simple filtering guideline for the removal of spurious variants based on the Kolmogorov-Smirnov statistical test. We experimentally validated our filters by comparing Pool-seq to individual sequencing data showing that the filters remove most of the false variants while retaining majority of true variants. The proposed guideline is fairly generic in nature and could be easily applied in other Pool-seq experiments.

Download Full-text

Detection of low-frequency resistance-mediating SNPs in next-generation sequencing data of Mycobacterium tuberculosis complex strains with binoSNP

Scientific Reports ◽

10.1038/s41598-020-64708-8 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Viola Dreyer ◽

Christian Utpatel ◽

Thomas A. Kohl ◽

Ivan Barilar ◽

Matthias I. Gröschel ◽

...

Keyword(s):

Mycobacterium Tuberculosis ◽

Next Generation Sequencing ◽

Mycobacterium Tuberculosis Complex ◽

Low Frequency ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Tuberculosis Complex ◽

Generation Sequencing

Download Full-text

B-cell enrichment for next generation sequencing (NGS): An approach to detect actionable low frequency variants in B-cell lymphomas.

Journal of Clinical Oncology ◽

10.1200/jco.2018.36.15_suppl.e24165 ◽

2018 ◽

Vol 36 (15_suppl) ◽

pp. e24165-e24165

Author(s):

Maya P Panjikaran ◽

Claire Orosco ◽

Brian Kwok ◽

Yu Xia ◽

Lauryn Keeler ◽

...

Keyword(s):

Next Generation Sequencing ◽

B Cell ◽

Low Frequency ◽

Next Generation ◽

B Cell Lymphomas ◽

Cell Enrichment ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

Download Full-text

Bioinformatics for Clinical Next Generation Sequencing

Clinical Chemistry ◽

10.1373/clinchem.2014.224360 ◽

2015 ◽

Vol 61 (1) ◽

pp. 124-135 ◽

Cited By ~ 56

Author(s):

Gavin R Oliver ◽

Steven N Hart ◽

Eric W Klee

Keyword(s):

Next Generation Sequencing ◽

Service Providers ◽

Clinical Laboratory ◽

Work Flow ◽

Next Generation ◽

Sequencing Data ◽

Regulatory Requirements ◽

Bioinformatics Analyses ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

Abstract BACKGROUND Next generation sequencing (NGS)-based assays continue to redefine the field of genetic testing. Owing to the complexity of the data, bioinformatics has become a necessary component in any laboratory implementing a clinical NGS test. CONTENT The computational components of an NGS-based work flow can be conceptualized as primary, secondary, and tertiary analytics. Each of these components addresses a necessary step in the transformation of raw data into clinically actionable knowledge. Understanding the basic concepts of these analysis steps is important in assessing and addressing the informatics needs of a molecular diagnostics laboratory. Equally critical is a familiarity with the regulatory requirements addressing the bioinformatics analyses. These and other topics are covered in this review article. SUMMARY Bioinformatics has become an important component in clinical laboratories generating, analyzing, maintaining, and interpreting data from molecular genetics testing. Given the rapid adoption of NGS-based clinical testing, service providers must develop informatics work flows that adhere to the rigor of clinical laboratory standards, yet are flexible to changes as the chemistry and software for analyzing sequencing data mature.

Download Full-text

easyfm : An easy software suite for file manipulation of Next Generation Sequencing data on desktops

10.1101/2021.09.29.462291 ◽

2021 ◽

Author(s):

Hyungtaek Jung ◽

Brendan Jeon ◽

Daniel Ortiz-Barrientos

Keyword(s):

Next Generation Sequencing ◽

High Performance ◽

Biological Data ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

File Formats ◽

Biological Data Analysis ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

Storing and manipulating Next Generation Sequencing (NGS) file formats is an essential but difficult task in biological data analysis. The easyfm ( easy f ile m anipulation) toolkit ( https://github.com/TaekAndBrendan/easyfm ) makes manipulating commonly used NGS files more accessible to biologists. It enables them to perform end-to-end reproducible data analyses using a free standalone desktop application (available on Windows, Mac and Linux). Unlike existing tools (e.g. Galaxy), the Graphical User Interface (GUI)-based easyfm is not dependent on any high-performance computing (HPC) system and can be operated without an internet connection. This specific benefit allow easyfm to seamlessly integrate visual and interactive representations of NGS files, supporting a wider scope of bioinformatics applications in the life sciences.

Download Full-text

Metagenomics: Tools and Insights for Analyzing Next-Generation Sequencing Data Derived from Biodiversity Studies

Bioinformatics and Biology Insights ◽

10.4137/bbi.s12462 ◽

2015 ◽

Vol 9 ◽

pp. BBI.S12462 ◽

Cited By ~ 180

Author(s):

Anastasis Oulas ◽

Christina Pavloudi ◽

Paraskevi Polymenakou ◽

Georgios A. Pavlopoulos ◽

Nikolas Papanikolaou ◽

...

Keyword(s):

Next Generation Sequencing ◽

Rapid Expansion ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Sequencing Technologies ◽

Multiple Datasets ◽

Dataset Analysis ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

Advances in next-generation sequencing (NGS) have allowed significant breakthroughs in microbial ecology studies. This has led to the rapid expansion of research in the field and the establishment of “metagenomics”, often defined as the analysis of DNA from microbial communities in environmental samples without prior need for culturing. Many metagenomics statistical/computational tools and databases have been developed in order to allow the exploitation of the huge influx of data. In this review article, we provide an overview of the sequencing technologies and how they are uniquely suited to various types of metagenomic studies. We focus on the currently available bioinformatics techniques, tools, and methodologies for performing each individual step of a typical metagenomic dataset analysis. We also provide future trends in the field with respect to tools and technologies currently under development. Moreover, we discuss data management, distribution, and integration tools that are capable of performing comparative metagenomic analyses of multiple datasets using well-established databases, as well as commonly used annotation standards.

Download Full-text

Rational “Error Elimination” Approach to Evaluating Molecular Barcoded Next-Generation Sequencing Data Identifies Low-Frequency Mutations in Hematologic Malignancies

Journal of Molecular Diagnostics ◽

10.1016/j.jmoldx.2019.01.008 ◽

2019 ◽

Vol 21 (3) ◽

pp. 471-482 ◽

Cited By ~ 1

Author(s):

Saradhi Mallampati ◽

Dzifa Y. Duose ◽

Michael A. Harmon ◽

Meenakshi Mehrotra ◽

Rashmi Kanagal-Shamanna ◽

...

Keyword(s):

Next Generation Sequencing ◽

Low Frequency ◽

Hematologic Malignancies ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Error Elimination ◽

Generation Sequencing

Download Full-text

Primer ID Validates Template Sampling Depth and Greatly Reduces the Error Rate of Next-Generation Sequencing of HIV-1 Genomic RNA Populations

Journal of Virology ◽

10.1128/jvi.00522-15 ◽

2015 ◽

Vol 89 (16) ◽

pp. 8540-8555 ◽

Cited By ~ 77

Author(s):

Shuntai Zhou ◽

Corbin Jones ◽

Piotr Mieczkowski ◽

Ronald Swanstrom

Keyword(s):

Next Generation Sequencing ◽

Error Rate ◽

Consensus Sequence ◽

Next Generation ◽

Sampling Depth ◽

Data Set ◽

Sequencing Errors ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing ◽

Hiv 1

ABSTRACTValidating the sampling depth and reducing sequencing errors are critical for studies of viral populations using next-generation sequencing (NGS). We previously described the use of Primer ID to tag each viral RNA template with a block of degenerate nucleotides in the cDNA primer. We now show that low-abundance Primer IDs (offspring Primer IDs) are generated due to PCR/sequencing errors. These artifactual Primer IDs can be removed using a cutoff model for the number of reads required to make a template consensus sequence. We have modeled the fraction of sequences lost due to Primer ID resampling. For a typical sequencing run, less than 10% of the raw reads are lost to offspring Primer ID filtering and resampling. The remaining raw reads are used to correct for PCR resampling and sequencing errors. We also demonstrate that Primer ID reveals bias intrinsic to PCR, especially at low template input or utilization. cDNA synthesis and PCR convert ca. 20% of RNA templates into recoverable sequences, and 30-fold sequence coverage recovers most of these template sequences. We have directly measured the residual error rate to be around 1 in 10,000 nucleotides. We use this error rate and the Poisson distribution to define the cutoff to identify preexisting drug resistance mutations at low abundance in an HIV-infected subject. Collectively, these studies show that >90% of the raw sequence reads can be used to validate template sampling depth and to dramatically reduce the error rate in assessing a genetically diverse viral population using NGS.IMPORTANCEAlthough next-generation sequencing (NGS) has revolutionized sequencing strategies, it suffers from serious limitations in defining sequence heterogeneity in a genetically diverse population, such as HIV-1 due to PCR resampling and PCR/sequencing errors. The Primer ID approach reveals the true sampling depth and greatly reduces errors. Knowing the sampling depth allows the construction of a model of how to maximize the recovery of sequences from input templates and to reduce resampling of the Primer ID so that appropriate multiplexing can be included in the experimental design. With the defined sampling depth and measured error rate, we are able to assign cutoffs for the accurate detection of minority variants in viral populations. This approach allows the power of NGS to be realized without having to guess about sampling depth or to ignore the problem of PCR resampling, while also being able to correct most of the errors in the data set.

Download Full-text

Characterization of Viral Populations by Using Circular Sequencing

Journal of Virology ◽

10.1128/jvi.00804-14 ◽

2016 ◽

Vol 90 (20) ◽

pp. 8950-8953 ◽

Cited By ~ 11

Author(s):

Zachary J. Whitfield ◽

Raul Andino

Keyword(s):

Next Generation Sequencing ◽

Error Rate ◽

Viral Evolution ◽

Low Frequency ◽

Next Generation ◽

Experimental Planning ◽

Genetic Makeup ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

With the enormous sizes viral populations reach, many variants are at too low a frequency to be detected by conventional next-generation sequencing (NGS) methods. Circular sequencing (CirSeq) is a method by which the error rate of next-generation sequencing is decreased so that even low-frequency viral variants can be accurately detected. The ability to visualize almost the entire genetic makeup of a viral swarm has implications for epidemiology, viral evolution, and vaccine design. Here we discuss experimental planning, analysis, and recent insights using CirSeq.

Download Full-text

Introduction to the analysis of next generation sequencing data and its application to venous thromboembolism

Thrombosis and Haemostasis ◽

10.1160/th15-05-0411 ◽

2015 ◽

Vol 114 (11) ◽

pp. 920-932 ◽

Cited By ~ 5

Author(s):

Joost C. M. Meijers ◽

Saskia Middeldorp ◽

Marisa L. R. Cunha

Keyword(s):

Venous Thromboembolism ◽

Next Generation Sequencing ◽

Clinical Care ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Ngs Data Analysis ◽

Next Generation Sequencing Ngs ◽

Ngs Data ◽

Generation Sequencing

SummaryDespite knowledge of various inherited risk factors associated with venous thromboembolism (VTE), no definite cause can be found in about 50% of patients. The application of data-driven searches such as GWAS has not been able to identify genetic variants with implications for clinical care, and unexplained heritability remains. In the past years, the development of several so-called next generation sequencing (NGS) platforms is offering the possibility of generating fast, inexpensive and accurate genomic information. However, so far their application to VTE has been very limited. Here we review basic concepts of NGS data analysis and explore the application of NGS technology to VTE. We provide both computational and biological viewpoints to discuss potentials and challenges of NGS-based studies.

Download Full-text

Next generation sequencing of SARS-CoV-2 genomes: challenges, applications and opportunities

Briefings in Bioinformatics ◽

10.1093/bib/bbaa297 ◽

2020 ◽

Author(s):

Matteo Chiara ◽

Anna Maria D’Erchia ◽

Carmela Gissi ◽

Caterina Manzari ◽

Antonio Parisi ◽

...

Keyword(s):

Next Generation Sequencing ◽

Genomic Sequence ◽

Molecular Diagnostic ◽

Next Generation ◽

Sequencing Data ◽

Related Data ◽

Molecular Tests ◽

Methodological Approaches ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

Abstract Various next generation sequencing (NGS) based strategies have been successfully used in the recent past for tracing origins and understanding the evolution of infectious agents, investigating the spread and transmission chains of outbreaks, as well as facilitating the development of effective and rapid molecular diagnostic tests and contributing to the hunt for treatments and vaccines. The ongoing COVID-19 pandemic poses one of the greatest global threats in modern history and has already caused severe social and economic costs. The development of efficient and rapid sequencing methods to reconstruct the genomic sequence of SARS-CoV-2, the etiological agent of COVID-19, has been fundamental for the design of diagnostic molecular tests and to devise effective measures and strategies to mitigate the diffusion of the pandemic. Diverse approaches and sequencing methods can, as testified by the number of available sequences, be applied to SARS-CoV-2 genomes. However, each technology and sequencing approach has its own advantages and limitations. In the current review, we will provide a brief, but hopefully comprehensive, account of currently available platforms and methodological approaches for the sequencing of SARS-CoV-2 genomes. We also present an outline of current repositories and databases that provide access to SARS-CoV-2 genomic data and associated metadata. Finally, we offer general advice and guidelines for the appropriate sharing and deposition of SARS-CoV-2 data and metadata, and suggest that more efficient and standardized integration of current and future SARS-CoV-2-related data would greatly facilitate the struggle against this new pathogen. We hope that our ‘vademecum’ for the production and handling of SARS-CoV-2-related sequencing data, will contribute to this objective.

Download Full-text