False Negatives Are a Significant Feature of Next Generation Sequencing Callsets

AbstractShort-read, next-generation sequencing (NGS) is now broadly used to identify rare or de novo mutations in population samples and disease cohorts. However, NGS data is known to be error-prone and post-processing pipelines have primarily focused on the removal of spurious mutations or “false positives” for downstream genome datasets. Less attention has been paid to characterizing the fraction of missing mutations or “false negatives” (FN). Here we interrogate several publically available human NGS autosomal variant datasets using corresponding Sanger sequencing as a truth-set. We examine both low-coverage Illumina and high-coverage Complete Genomics genomes. We show that the FN rate varies between 3%-18% and that false-positive rates are considerably lower (<3%) for publically available human genome callsets like 1000 Genomes. The FN rate is strongly dependent on calling pipeline parameters, as well as read coverage. Our results demonstrate that missing mutations are a significant feature of genomic datasets and imply additional fine-tuning of bioinformatics pipelines is needed. To address this, we design a phylogeny-aware tool [PhyloFaN] which can be used to quantify the FN rate for haploid genomic experiments, without additional generation of validation data. Using PhyloFaN on ultra-high coverage NGS data from both Illumina HiSeq and Complete Genomics platforms derived from the 1000 Genomes Project, we characterize the false negative rate in human mtDNA genomes. The false negative rate for the publically available mtDNA callsets is 17-20%, even for extremely high coverage haploid data.

Download Full-text

Hidden biases in germline structural variant detection

Genome Biology ◽

10.1186/s13059-021-02558-x ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Michael M. Khayat ◽

Sayed Mohammad Ebrahim Sahraeian ◽

Samantha Zarate ◽

Andrew Carroll ◽

Huixiao Hong ◽

...

Keyword(s):

Next Generation Sequencing ◽

False Negative ◽

False Negative Rate ◽

Next Generation Sequencing Data ◽

Chinese Family ◽

Next Generation ◽

Sequencing Data ◽

Structural Variations ◽

The Impact ◽

Generation Sequencing

Abstract Background Genomic structural variations (SV) are important determinants of genotypic and phenotypic changes in many organisms. However, the detection of SV from next-generation sequencing data remains challenging. Results In this study, DNA from a Chinese family quartet is sequenced at three different sequencing centers in triplicate. A total of 288 derivative data sets are generated utilizing different analysis pipelines and compared to identify sources of analytical variability. Mapping methods provide the major contribution to variability, followed by sequencing centers and replicates. Interestingly, SV supported by only one center or replicate often represent true positives with 47.02% and 45.44% overlapping the long-read SV call set, respectively. This is consistent with an overall higher false negative rate for SV calling in centers and replicates compared to mappers (15.72%). Finally, we observe that the SV calling variability also persists in a genotyping approach, indicating the impact of the underlying sequencing and preparation approaches. Conclusions This study provides the first detailed insights into the sources of variability in SV identification from next-generation sequencing and highlights remaining challenges in SV calling for large cohorts. We further give recommendations on how to reduce SV calling variability and the choice of alignment methodology.

Download Full-text

Proficiency Testing of Standardized Samples Shows High Interlaboratory Agreement for Clinical Next-Generation Sequencing–Based Hematologic Malignancy Assays With Survey Material–Specific Differences in Variant Frequencies

Archives of Pathology & Laboratory Medicine ◽

10.5858/arpa.2019-0352-cp ◽

2020 ◽

Vol 144 (8) ◽

pp. 959-966 ◽

Cited By ~ 1

Author(s):

Alissa Keegan ◽

Julia A. Bridge ◽

Neal I. Lindeman ◽

Thomas A. Long ◽

Jason D. Merker ◽

...

Keyword(s):

Next Generation Sequencing ◽

Proficiency Testing ◽

Hematologic Malignancy ◽

False Negative ◽

False Negative Rate ◽

Hematologic Malignancies ◽

Variant Allele ◽

Next Generation ◽

Single Nucleotide Variants ◽

Generation Sequencing

Context.— As laboratories increasingly turn from single-analyte testing in hematologic malignancies to next-generation sequencing–based panel testing, there is a corresponding need for proficiency testing to ensure adequate performance of these next-generation sequencing assays for optimal patient care. Objective.— To report the performance of laboratories on proficiency testing from the first 4 College of American Pathologists Next-Generation Sequencing Hematologic Malignancy surveys. Design.— College of American Pathologists proficiency testing results for 36 different engineered variants and/or allele fractions as well as a sample with no pathogenic variants were analyzed for accuracy and associated assay performance characteristics. Results.— The overall sensitivity observed for all variants was 93.5% (2190 of 2341) with 99.8% specificity (22 800 of 22 840). The false-negative rate was 6.5% (151 of 2341), and the largest single cause of these errors was difficulty in identifying variants in the sequence of CEBPA that is rich in cytosines and guanines. False-positive results (0.18%; 40 of 22 840) were most likely the result of preanalytic or postanalytic errors. Interestingly, the variant allele fractions were almost uniformly lower than the engineered fraction (as measured by digital polymerase chain reaction). Extensive troubleshooting identified a multifactorial cause for the low variant allele fractions, a result of an interaction between the linearized nature of the plasmid and the Illumina TruSeq chemistry. Conclusions.— Laboratories demonstrated an overall accuracy of 99.2% (24 990 of 25 181) with 99.8% specificity and 93.5% sensitivity when examining 36 clinically relevant somatic single-nucleotide variants with a variant allele fraction of 10% or greater. The data also highlight an issue with artificial linearized plasmids as survey material for next-generation sequencing.

Download Full-text

Clinical Analysis of Metagenomic Next-generation Sequencing Confirmed Chlamydia psittaci Pneumonia: A Case Series and Literature Review

10.21203/rs.3.rs-189997/v1 ◽

2021 ◽

Author(s):

Xin-Qi Teng ◽

Wen-Cheng Gong ◽

Ting-Ting Qi ◽

Guo-Hua Li ◽

Qiang Qu ◽

...

Keyword(s):

Next Generation Sequencing ◽

Literature Review ◽

Clinical Characteristics ◽

Pathogenic Bacteria ◽

False Negative ◽

False Negative Rate ◽

Case Series ◽

Chlamydia Psittaci ◽

Next Generation ◽

Generation Sequencing

Abstract Introduction: Chlamydia psittaci infection is a zoonotic infectious disease, which mainly inhaled through the lungs when exposed to the secretions of poultry that carry pathogenic bacteria. The traditional respiratory specimens or serological antibody testing is slow and the false-negative rate is high. Metagenomic next-generation sequencing gives a promising rapid diagnosis tool. Methods: We retrospective summarized the clinical characteristics of five C. psittaci pneumonia patients diagnosed by mNGS, conducted a literature review summarizing the clinical characteristics of patients with C. psittaci pneumonia reported since 2010.Results: Five C. psittaci. pneumonia patients confirmed by mNGS aged from 36 to 66 years with three males. 60% of patients had type 2 diabetes mellitus. And 60% of patients had a history of contact with avian or poultry. All patients had a high fever over 38.5 °C, cough, hypodynamia, hypoxemia, and dyspnea on admission. Two patients had invasive ventilator support and Extracorporeal Membrane Oxygenation support. The levels of C-reactive protein, procalcitonin, and erythrocyte sedimentation rate on admission and follow-up were all higher than normal values. Doxycycline or moxifloxacin monotherapy was accounted for 1/5 (20%) and 2/5 (40%) patients, and combination therapy was accounted for 2/5(40%) patients. Four patients improved and were discharged, and one patient died due to multiple organ failure and disseminated intravascular coagulation.Conclusions: mNGS can increase the detection rate of C. psittaci, shorten the diagnosis time of C. psittaci pneumonia and improve the prognosis of patients.

Download Full-text

Clinically Meaningful Change

Methodology ◽

10.1027/1614-2241/a000168 ◽

2019 ◽

Vol 15 (3) ◽

pp. 97-105

Author(s):

Rodrigo Ferrer ◽

Antonio Pardo

Keyword(s):

Effect Size ◽

False Negative ◽

False Negative Rate ◽

Point Of View ◽

Skewed Distribution ◽

Effect Sizes ◽

False Negatives ◽

Large Size ◽

Before And After ◽

Post Test

Abstract. In a recent paper, Ferrer and Pardo (2014) tested several distribution-based methods designed to assess when test scores obtained before and after an intervention reflect a statistically reliable change. However, we still do not know how these methods perform from the point of view of false negatives. For this purpose, we have simulated change scenarios (different effect sizes in a pre-post-test design) with distributions of different shapes and with different sample sizes. For each simulated scenario, we generated 1,000 samples. In each sample, we recorded the false-negative rate of the five distribution-based methods with the best performance from the point of view of the false positives. Our results have revealed unacceptable rates of false negatives even with effects of very large size, starting from 31.8% in an optimistic scenario (effect size of 2.0 and a normal distribution) to 99.9% in the worst scenario (effect size of 0.2 and a highly skewed distribution). Therefore, our results suggest that the widely used distribution-based methods must be applied with caution in a clinical context, because they need huge effect sizes to detect a true change. However, we made some considerations regarding the effect size and the cut-off points commonly used which allow us to be more precise in our estimates.

Download Full-text

NGSremix: A software tool for estimating pairwise relatedness between admixed individuals from next-generation sequencing data

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab174 ◽

2021 ◽

Author(s):

Anne Krogh Nøhr ◽

Kristian Hanghøj ◽

Genis Garcia Erill ◽

Zilong Li ◽

Ida Moltke ◽

...

Keyword(s):

Next Generation Sequencing ◽

Genetic Research ◽

Likelihood Estimation ◽

Software Tool ◽

Estimation Methods ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Ngs Data ◽

Generation Sequencing

Abstract Estimation of relatedness between pairs of individuals is important in many genetic research areas. When estimating relatedness, it is important to account for admixture if this is present. However, the methods that can account for admixture are all based on genotype data as input, which is a problem for low-depth next-generation sequencing (NGS) data from which genotypes are called with high uncertainty. Here we present a software tool, NGSremix, for maximum likelihood estimation of relatedness between pairs of admixed individuals from low-depth NGS data, which takes the uncertainty of the genotypes into account via genotype likelihoods. Using both simulated and real NGS data for admixed individuals with an average depth of 4x or below we show that our method works well and clearly outperforms all the commonly used state-of-the-art relatedness estimation methods PLINK, KING, relateAdmix, and ngsRelate that all perform quite poorly. Hence, NGSremix is a useful new tool for estimating relatedness in admixed populations from low-depth NGS data. NGSremix is implemented in C/C ++ in a multi-threaded software and is freely available on Github https://github.com/KHanghoj/NGSremix.

Download Full-text

Application of Immunosignatures for Diagnosis of Valley Fever

Clinical and Vaccine Immunology ◽

10.1128/cvi.00228-14 ◽

2014 ◽

Vol 21 (8) ◽

pp. 1169-1177 ◽

Cited By ~ 13

Author(s):

Krupa Arun Navalkar ◽

Stephen Albert Johnston ◽

Neal Woodbury ◽

John N. Galgiani ◽

D. Mitchell Magee ◽

...

Keyword(s):

Bacterial Infections ◽

Random Sequence ◽

False Negative ◽

False Negative Rate ◽

Peptide Array ◽

Igg Antibodies ◽

False Negatives ◽

Peptide Microarray ◽

Valley Fever ◽

Antibody Levels

ABSTRACTValley fever (VF) is difficult to diagnose, partly because the symptoms of VF are confounded with those of other community-acquired pneumonias. Confirmatory diagnostics detect IgM and IgG antibodies against coccidioidal antigens via immunodiffusion (ID). The false-negative rate can be as high as 50% to 70%, with 5% of symptomatic patients never showing detectable antibody levels. In this study, we tested whether the immunosignature diagnostic can resolve VF false negatives. An immunosignature is the pattern of antibody binding to random-sequence peptides on a peptide microarray. A 10,000-peptide microarray was first used to determine whether valley fever patients can be distinguished from 3 other cohorts with similar infections. After determining the VF-specific peptides, a small 96-peptide diagnostic array was created and tested. The performances of the 10,000-peptide array and the 96-peptide diagnostic array were compared to that of the ID diagnostic standard. The 10,000-peptide microarray classified the VF samples from the other 3 infections with 98% accuracy. It also classified VF false-negative patients with 100% sensitivity in a blinded test set versus 28% sensitivity for ID. The immunosignature microarray has potential for simultaneously distinguishing valley fever patients from those with other fungal or bacterial infections. The same 10,000-peptide array can diagnose VF false-negative patients with 100% sensitivity. The smaller 96-peptide diagnostic array was less specific for diagnosing false negatives. We conclude that the performance of the immunosignature diagnostic exceeds that of the existing standard, and the immunosignature can distinguish related infections and might be used in lieu of existing diagnostics.

Download Full-text

Mining and Development of Novel SSR Markers Using Next Generation Sequencing (NGS) Data in Plants

Molecules ◽

10.3390/molecules23020399 ◽

2018 ◽

Vol 23 (2) ◽

pp. 399 ◽

Cited By ~ 41

Author(s):

Sima Taheri ◽

Thohirah Lee Abdullah ◽

Mohd Yusop ◽

Mohamed Hanafi ◽

Mahbod Sahebi ◽

...

Keyword(s):

Next Generation Sequencing ◽

Ssr Markers ◽

Next Generation ◽

Next Generation Sequencing Ngs ◽

Ngs Data ◽

Generation Sequencing

Download Full-text

High throughput crop genome genotyping by a combination of pool next generation sequencing and haplotype-based data processing

10.21203/rs.3.rs-415602/v1 ◽

2021 ◽

Author(s):

Michael Schneider ◽

Asis Shrestha ◽

Agim Ballvora ◽

Jens Leon

Keyword(s):

Next Generation Sequencing ◽

Allele Frequency ◽

Frequency Estimation ◽

Whole Genome ◽

Next Generation ◽

Conservation Genomics ◽

High Coverage ◽

Allele Frequency Estimation ◽

Low Coverage ◽

Generation Sequencing

Abstract BackgroundThe identification of environmentally specific alleles and the observation of evolutional processes is a goal of conservation genomics. By generational changes of allele frequencies in populations, questions regarding effective population size, gene flow, drift, and selection can be addressed. The observation of such effects often is a trade-off of costs and resolution, when a decent sample of genotypes should be genotyped for many loci. Pool genotyping approaches can derive a high resolution and precision in allele frequency estimation, when high coverage sequencing is utilized. Still, pool high coverage pool sequencing of big genomes comes along with high costs.ResultsHere we present a reliable method to estimate a barley population’s allele frequency at low coverage sequencing. Three hundred genotypes were sampled from a barley backcross population to estimate the entire population’s allele frequency. The allele frequency estimation accuracy and yield were compared for three next generation sequencing methods. To reveal accurate allele frequency estimates on a low coverage sequencing level, a haplotyping approach was performed. Low coverage allele frequency of positional connected single polymorphisms were aggregated to a single haplotype allele frequency, resulting in two to 271 times higher depth and increased precision. We compared different haplotyping tactics, showing that gene and chip marker-based haplotypes perform on par or better than simple contig haplotype windows. The comparison of multiple pool samples and the referencing against an individual sequencing approach revealed whole genome pool resequencing having the highest correlation to individual genotyping (up to 0.97), while transcriptomics and genotyping by sequencing indicated higher error rates and lower correlations.ConclusionUsing the proposed method allows to identify the allele frequency of populations with high accuracy at low cost. This is particularly interesting for conservation genomics in species with big genomes, like barley or wheat. Whole genome low coverage resequencing at 10x coverage can deliver a highly accurate estimation of the allele frequency, when a loci-based haplotyping approach is applied. Using annotated haplotypes allows to capitalize from biological background and statistical robustness.

Download Full-text

Appendix A: Common File Types Used in Next-Generation Sequencing (NGS) Data Analysis

Next-Generation Sequencing Data Analysis ◽

10.1201/b19532-20 ◽

2016 ◽

pp. 199-202

Keyword(s):

Data Analysis ◽

Next Generation Sequencing ◽

Next Generation ◽

Ngs Data Analysis ◽

Next Generation Sequencing Ngs ◽

Ngs Data ◽

Generation Sequencing

Download Full-text

Identification of Genetic Hereditary Predisposition to Hematologic Malignancies By Clinical Next-Generation Sequencing

Blood ◽

10.1182/blood.v126.23.3854.3854 ◽

2015 ◽

Vol 126 (23) ◽

pp. 3854-3854 ◽

Cited By ~ 2

Author(s):

Amy E Knight Johnson ◽

Lucia Guidugli ◽

Kelly Arndt ◽

Gorka Alkorta-Aranburu ◽

Viswateja Nelakuditi ◽

...

Keyword(s):

Next Generation Sequencing ◽

Sanger Sequencing ◽

Family Members ◽

Hematologic Malignancies ◽

Dyskeratosis Congenita ◽

Molecular Diagnostic ◽

Next Generation ◽

Hereditary Predisposition ◽

Ngs Data ◽

Generation Sequencing

Abstract Introduction: Myelodysplastic syndrome (MDS) and acute leukemia (AL) are a clinically diverse and genetically heterogeneous group of hematologic malignancies. Familial forms of MDS/AL have been increasingly recognized in recent years, and can occur as a primary event or secondary to genetic syndromes, such as inherited bone marrow failure syndromes (IBMFS). It is critical to confirm a genetic diagnosis in patients with hereditary predisposition to hematologic malignancies in order to provide prognostic information and cancer risk assessment, and to aid in identification of at-risk or affected family members. In addition, a molecular diagnosis can help tailor medical management including informing the selection of family members for allogeneic stem cell transplantation donors. Until recently, clinical testing options for this diverse group of hematologic malignancy predisposition genes were limited to the evaluation of single genes by Sanger sequencing, which is a time consuming and expensive process. To improve the diagnosis of hereditary predisposition to hematologic malignancies, our CLIA-licensed laboratory has recently developed Next-Generation Sequencing (NGS) panel-based testing for these genes. Methods: Thirty six patients with personal and/or family history of aplastic anemia, MDS or AL were referred for clinical diagnostic testing. DNA from the referred patients was obtained from cultured skin fibroblasts or peripheral blood and was utilized for preparing libraries with the SureSelectXT Enrichment System. Libraries were sequenced on an Illumina MiSeq instrument and the NGS data was analyzed with a custom bioinformatic pipeline, targeting a panel of 76 genes associated with IBMFS and/or familial MDS/AL. Results: Pathogenic and highly likely pathogenic variants were identified in 7 out of 36 patients analyzed, providing a positive molecular diagnostic rate of 20%. Overall, 6 out of the 7 pathogenic changes identified were novel. In 2 unrelated patients with MDS, heterozygous pathogenic sequence changes were identified in the GATA2 gene. Heterozygous pathogenic changes in the following autosomal dominant genes were each identified in a single patient: RPS26 (Diamond-Blackfan anemia 10), RUNX1 (familial platelet disorder with propensity to myeloid malignancy), TERT (dyskeratosis congenita 4) and TINF2 (dyskeratosis congenita 3). In addition, one novel heterozygous sequence change (c.826+5_826+9del, p.?) in the Fanconi anemia associated gene FANCA was identified. . The RNA analysis demonstrated this variant causes skipping of exon 9 and results in a premature stop codon in exon 10. Further review of the NGS data provided evidence of an additional large heterozygous multi-exon deletion in FANCA in the same patient. This large deletion was confirmed using array-CGH (comparative genomic hybridization). Conclusions: This study demonstrates the effectiveness of using NGS technology to identify patients with a hereditary predisposition to hematologic malignancies. As many of the genes associated with hereditary predisposition to hematologic malignancies have similar or overlapping clinical presentations, analysis of a diverse panel of genes is an efficient and cost-effective approach to molecular diagnostics for these disorders. Unlike Sanger sequencing, NGS technology also has the potential to identify large exonic deletions and duplications. In addition, RNA splicing assay has proven to be helpful in clarifying the pathogenicity of variants suspected to affect splicing. This approach will also allow for identification of a molecular defect in patients who may have atypical presentation of disease. Disclosures No relevant conflicts of interest to declare.

Download Full-text