Reliability of genomic variants across different next-generation sequencing platforms and bioinformatic processing pipelines

Abstract Background Next Generation Sequencing (NGS) is the fundament of various studies, providing insights into questions from biology and medicine. Nevertheless, integrating data from different experimental backgrounds can introduce strong biases. In order to methodically investigate the magnitude of systematic errors in single nucleotide variant calls, we performed a cross-sectional observational study on a genomic cohort of 99 subjects each sequenced via (i) Illumina HiSeq X, (ii) Illumina HiSeq, and (iii) Complete Genomics and processed with the respective bioinformatic pipeline. We also repeated variant calling for the Illumina cohorts with GATK, which allowed us to investigate the effect of the bioinformatics analysis strategy separately from the sequencing platform's impact.Results The number of detected variants/variant classes per individual was highly dependent on the experimental setup. We observed a statistically significant overrepresentation of variants uniquely called by a single setup, indicating potential systematic biases. Insertion/deletion polymorphisms (InDels) were associated with decreased concordance compared to single nucleotide polymorphisms (SNPs). The discrepancies in InDel absolute numbers were particularly prominent in introns, Alu elements, simple repeats, and regions with medium GC content. Notably, reprocessing sequencing data following the best practice recommendations of GATK considerably improved concordance between the respective setups.Conclusion We provide empirical evidence of systematic heterogeneity in variant calls between alternative experimental and data analysis setups. Furthermore, our results demonstrate the benefit of reprocessing genomic data with harmonized pipelines when integrating data from different studies.

Download Full-text

Reliability of genomic variants across different next-generation sequencing platforms and bioinformatic processing pipelines

BMC Genomics ◽

10.1186/s12864-020-07362-8 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Stephan Weißbach ◽

Stanislav Sys ◽

Charlotte Hewel ◽

Hristo Todorov ◽

Susann Schweiger ◽

...

Keyword(s):

Next Generation Sequencing ◽

Gc Content ◽

Nucleotide Polymorphisms ◽

Next Generation ◽

Sequencing Data ◽

Illumina Hiseq ◽

Cross Sectional ◽

Single Nucleotide ◽

Alu Elements ◽

Generation Sequencing

Abstract Background Next Generation Sequencing (NGS) is the fundament of various studies, providing insights into questions from biology and medicine. Nevertheless, integrating data from different experimental backgrounds can introduce strong biases. In order to methodically investigate the magnitude of systematic errors in single nucleotide variant calls, we performed a cross-sectional observational study on a genomic cohort of 99 subjects each sequenced via (i) Illumina HiSeq X, (ii) Illumina HiSeq, and (iii) Complete Genomics and processed with the respective bioinformatic pipeline. We also repeated variant calling for the Illumina cohorts with GATK, which allowed us to investigate the effect of the bioinformatics analysis strategy separately from the sequencing platform’s impact. Results The number of detected variants/variant classes per individual was highly dependent on the experimental setup. We observed a statistically significant overrepresentation of variants uniquely called by a single setup, indicating potential systematic biases. Insertion/deletion polymorphisms (indels) were associated with decreased concordance compared to single nucleotide polymorphisms (SNPs). The discrepancies in indel absolute numbers were particularly prominent in introns, Alu elements, simple repeats, and regions with medium GC content. Notably, reprocessing sequencing data following the best practice recommendations of GATK considerably improved concordance between the respective setups. Conclusion We provide empirical evidence of systematic heterogeneity in variant calls between alternative experimental and data analysis setups. Furthermore, our results demonstrate the benefit of reprocessing genomic data with harmonized pipelines when integrating data from different studies.

Download Full-text

Reliability of genomic variants across different next-generation sequencing platforms and bioinformatic processing pipelines

10.21203/rs.3.rs-50691/v3 ◽

2021 ◽

Author(s):

Stephan Weißbach ◽

Stanislav Jur`Evic Sys ◽

Charlotte Hewel ◽

Hristo Todorov ◽

Susann Schweiger ◽

...

Keyword(s):

Next Generation Sequencing ◽

Gc Content ◽

Nucleotide Polymorphisms ◽

Next Generation ◽

Sequencing Data ◽

Illumina Hiseq ◽

Cross Sectional ◽

Single Nucleotide ◽

Alu Elements ◽

Generation Sequencing

Abstract BackgroundNext Generation Sequencing (NGS) is the fundament of various studies, providing insights into questions from biology and medicine. Nevertheless, integrating data from different experimental backgrounds can introduce strong biases. In order to methodically investigate the magnitude of systematic errors in single nucleotide variant calls, we performed a cross-sectional observational study on a genomic cohort of 99 subjects each sequenced via (i) Illumina HiSeq X, (ii) Illumina HiSeq, and (iii) Complete Genomics and processed with the respective bioinformatic pipeline. We also repeated variant calling for the Illumina cohorts with GATK, which allowed us to investigate the effect of the bioinformatics analysis strategy separately from the sequencing platform's impact. Results The number of detected variants/variant classes per individual was highly dependent on the experimental setup. We observed a statistically significant overrepresentation of variants uniquely called by a single setup, indicating potential systematic biases. Insertion/deletion polymorphisms (indels) were associated with decreased concordance compared to single nucleotide polymorphisms (SNPs). The discrepancies in indel absolute numbers were particularly prominent in introns, Alu elements, simple repeats, and regions with medium GC content. Notably, reprocessing sequencing data following the best practice recommendations of GATK considerably improved concordance between the respective setups.ConclusionWe provide empirical evidence of systematic heterogeneity in variant calls between alternative experimental and data analysis setups. Furthermore, our results demonstrate the benefit of reprocessing genomic data with harmonized pipelines when integrating data from different studies.

Download Full-text

A support vector machine for identification of single-nucleotide polymorphisms from next-generation sequencing data

Bioinformatics ◽

10.1093/bioinformatics/btt172 ◽

2013 ◽

Vol 29 (11) ◽

pp. 1361-1366 ◽

Cited By ~ 26

Author(s):

B. D. O'Fallon ◽

W. Wooderchak-Donahue ◽

D. K. Crockett

Keyword(s):

Support Vector Machine ◽

Next Generation Sequencing ◽

Single Nucleotide Polymorphisms ◽

Next Generation Sequencing Data ◽

Support Vector ◽

Nucleotide Polymorphisms ◽

Next Generation ◽

Sequencing Data ◽

Single Nucleotide ◽

Generation Sequencing

Download Full-text

Next-generation Sequence-analysis Toolkit (NeST): A standardized bioinformatics framework for analyzing Single Nucleotide Polymorphisms in next-generation sequencing data

10.1101/323535 ◽

2018 ◽

Author(s):

Shashidhar Ravishankar ◽

Sarah E. Schmedes ◽

Dhruviben S. Patel ◽

Mateusz Plucinski ◽

Venkatachalam Udhayakumar ◽

...

Keyword(s):

Next Generation Sequencing ◽

Single Nucleotide Polymorphisms ◽

Variant Calling ◽

Next Generation Sequencing Data ◽

Nucleotide Polymorphisms ◽

Next Generation ◽

Sequencing Data ◽

Single Nucleotide ◽

Bioinformatics Tools ◽

Generation Sequencing

AbstractRapid advancements in next-generation sequencing (NGS) technologies have led to the development of numerous bioinformatics tools and pipelines. As these tools vary in their output function and complexity and some are not well-standardized, it is harder to choose a suitable pipeline to identify variants in NGS data. Here, we present NeST (NGS-analysis Toolkit), a modular consensus-based variant calling framework. NeST uses a combination of variant callers to overcome potential biases of an individual method used alone. NeST consists of four modules, that integrate open-source bioinformatics tools, a custom Variant Calling Format (VCF) parser and a summarization utility, that generate high-quality consensus variant calls. NeST was validated using targeted-amplicon deep sequencing data from 245 Plasmodium falciparum isolates to identify single-nucleotide polymorphisms conferring drug resistance. The results were verified using Sanger sequencing data for the same dataset in a supporting publication [28]. NeST offers a user-friendly pipeline for variant calling with standardized outputs and minimal computational demands for easy deployment for use with various organisms and applications.

Download Full-text

Quantification of fetal DNA in the plasma of pregnant women using next generation sequencing of frequent single nucleotide polymorphisms

Bulletin of Russian State Medical University ◽

10.24075/brsmu.2018.031 ◽

2018 ◽

pp. 29-33

Author(s):

J. Shubina ◽

◽

T Jankevic ◽

A. Yu. Goltsov ◽

I. S. Mukosey ◽

...

Keyword(s):

Next Generation Sequencing ◽

Single Nucleotide Polymorphisms ◽

Pregnant Women ◽

Nucleotide Polymorphisms ◽

Next Generation ◽

Single Nucleotide ◽

Fetal Dna ◽

Generation Sequencing

Download Full-text

Repli-seq: genome-wide analysis of replication timing by next-generation sequencing

10.1101/104653 ◽

2017 ◽

Cited By ~ 8

Author(s):

Claire Marchal ◽

Takayo Sasaki ◽

Daniel Vera ◽

Korey Wilson ◽

Jiao Sima ◽

...

Keyword(s):

Next Generation Sequencing ◽

Replication Timing ◽

Nucleotide Polymorphisms ◽

Robust Methods ◽

Next Generation ◽

Single Nucleotide ◽

Cellular Processes ◽

Genome Wide ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

ABSTRACTCycling cells duplicate their DNA content during S phase, following a defined program called replication timing (RT). Early and late replicating regions differ in terms of mutation rates, transcriptional activity, chromatin marks and sub-nuclear position. Moreover, RT is regulated during development and is altered in disease. Exploring mechanisms linking RT to other cellular processes in normal and diseased cells will be facilitated by rapid and robust methods with which to measure RT genome wide. Here, we describe a rapid, robust and relatively inexpensive protocol to analyze genome-wide RT by next-generation sequencing (NGS). This protocol yields highly reproducible results across laboratories and platforms. We also provide computational pipelines for analysis, parsing phased genomes using single nucleotide polymorphisms (SNP) for analyzing RT allelic asynchrony, and for direct comparison to Repli-chip data obtained by analyzing nascent DNA by microarrays.

Download Full-text

Simultaneous human platelet antigen genotyping and detection of novel single nucleotide polymorphisms by targeted next-generation sequencing

Transfusion ◽

10.1111/trf.14092 ◽

2017 ◽

Vol 57 (6) ◽

pp. 1497-1504 ◽

Cited By ~ 4

Author(s):

Sue Davey ◽

Cristina Navarrete ◽

Colin Brown

Keyword(s):

Next Generation Sequencing ◽

Single Nucleotide Polymorphisms ◽

Human Platelet ◽

Nucleotide Polymorphisms ◽

Next Generation ◽

Single Nucleotide ◽

Targeted Next Generation Sequencing ◽

Platelet Antigen ◽

Human Platelet Antigen ◽

Generation Sequencing

Download Full-text

Single nucleotide polymorphism analysis of Korean native chickens using next generation sequencing data

Molecular Biology Reports ◽

10.1007/s11033-014-3790-5 ◽

2014 ◽

Vol 42 (2) ◽

pp. 471-477 ◽

Cited By ~ 9

Author(s):

Dong-Won Seo ◽

Jae-Don Oh ◽

Shil Jin ◽

Ki-Duk Song ◽

Hee-Bok Park ◽

...

Keyword(s):

Single Nucleotide Polymorphism ◽

Next Generation Sequencing ◽

Next Generation Sequencing Data ◽

Polymorphism Analysis ◽

Single Nucleotide Polymorphism Analysis ◽

Next Generation ◽

Sequencing Data ◽

Nucleotide Polymorphism ◽

Single Nucleotide ◽

Generation Sequencing

Download Full-text

Status and future perspectives of single nucleotide polymorphisms (SNPs) markers in farmed fishes: Way ahead using next generation sequencing

Gene Reports ◽

10.1016/j.genrep.2016.12.004 ◽

2017 ◽

Vol 6 ◽

pp. 81-86 ◽

Cited By ~ 5

Author(s):

Kiran Dashrath Rasal ◽

Vemlawada Chakrapani ◽

Amrendra Kumar Pandey ◽

Avinash Rambhau Rasal ◽

Jitendra K. Sundaray ◽

...

Keyword(s):

Next Generation Sequencing ◽

Single Nucleotide Polymorphisms ◽

Nucleotide Polymorphisms ◽

Next Generation ◽

Future Perspectives ◽

Single Nucleotide ◽

Generation Sequencing

Download Full-text

Reliability of genomic variants across different next-generation sequencing platforms and bioinformatic processing pipelines

10.21203/rs.3.rs-50691/v1 ◽

2020 ◽

Author(s):

Susanne Gerber ◽

Stephan Weißbach ◽

Stanislav Jur`Evic Sys ◽

Charlotte Hewel ◽

Hristo Todorov ◽

...

Keyword(s):

Next Generation Sequencing ◽

Allele Frequency ◽

Low Complexity ◽

Next Generation ◽

Illumina Hiseq ◽

Cross Sectional ◽

Genomic Annotation ◽

Complete Genomics ◽

Sequencing Platforms ◽

Generation Sequencing

Abstract Background Next Generation Sequencing (NGS) is the fundament of various studies providing insights into questions from biology and medicine. Nevertheless, integrating data from different experimental backgrounds can introduce strong biases. In order to methodically investigate the magnitude of systematic errors, we performed a cross-sectional observational study on a genomic cohort of 99 subjects each sequenced via (i) Illumina HiSeq X, (ii) Illumina HiSeq and (iii) Complete Genomics. Consequently, we systematically analyzed the heterogeneity between the sequencing cohorts with respect to genomic annotation and common filter criteria like minimum allele frequency (MAF). Results The number of detected variants/variant classes per individual was highly dependent on the sequencing technology. We observed a statistically significant overrepresentation of variants uniquely called by a single platform which indicates potential systematic biases. These variants were enriched in low complexity genomic regions and simple repeats. Furthermore, estimates of allele frequency were highly discrepant for a subset of variants in pairwise comparisons between different sequencing platforms. Applying common filters – such as MAF 5% and HWE- greatly reduced the heterogeneity between cohorts but still left discrepancies of several thousand variants after filtering.Conclusion We provide empirical evidence of systematic heterogeneity in variant calls between alternative experimental and data analysis setups. Our results highlight the potential benefit of reprocessing genomic data with harmonized pipelines when integrating data from different studies.

Download Full-text