neoepiscope improves neoepitope prediction with multivariant phasing

Mary A Wood; Austin Nguyen; Adam J Struck; Kyle Ellrott; Abhinav Nellore; Reid F Thompson

doi:10.1093/bioinformatics/btz653

neoepiscope improves neoepitope prediction with multivariant phasing

Bioinformatics ◽

10.1093/bioinformatics/btz653 ◽

2019 ◽

Vol 36 (3) ◽

pp. 713-720 ◽

Cited By ~ 5

Author(s):

Mary A Wood ◽

Austin Nguyen ◽

Adam J Struck ◽

Kyle Ellrott ◽

Abhinav Nellore ◽

...

Keyword(s):

False Negative ◽

Supplementary Information ◽

Supplementary File ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Somatic Variant ◽

Negative Results ◽

Multiple Datasets ◽

False Negative Results

Abstract Motivation The vast majority of tools for neoepitope prediction from DNA sequencing of complementary tumor and normal patient samples do not consider germline context or the potential for the co-occurrence of two or more somatic variants on the same mRNA transcript. Without consideration of these phenomena, existing approaches are likely to produce both false-positive and false-negative results, resulting in an inaccurate and incomplete picture of the cancer neoepitope landscape. We developed neoepiscope chiefly to address this issue for single nucleotide variants (SNVs) and insertions/deletions (indels). Results Herein, we illustrate how germline and somatic variant phasing affects neoepitope prediction across multiple datasets. We estimate that up to ∼5% of neoepitopes arising from SNVs and indels may require variant phasing for their accurate assessment. neoepiscope is performant, flexible and supports several major histocompatibility complex binding affinity prediction tools. Availability and implementation neoepiscope is available on GitHub at https://github.com/pdxgx/neoepiscope under the MIT license. Scripts for reproducing results described in the text are available at https://github.com/pdxgx/neoepiscope-paper under the MIT license. Additional data from this study, including summaries of variant phasing incidence and benchmarking wallclock times, are available in Supplementary Files 1, 2 and 3. Supplementary File 1 contains Supplementary Table 1, Supplementary Figures 1 and 2, and descriptions of Supplementary Tables 2–8. Supplementary File 2 contains Supplementary Tables 2–6 and 8. Supplementary File 3 contains Supplementary Table 7. Raw sequencing data used for the analyses in this manuscript are available from the Sequence Read Archive under accessions PRJNA278450, PRJNA312948, PRJNA307199, PRJNA343789, PRJNA357321, PRJNA293912, PRJNA369259, PRJNA305077, PRJNA306070, PRJNA82745 and PRJNA324705; from the European Genome-phenome Archive under accessions EGAD00001004352 and EGAD00001002731; and by direct request to the authors. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

neoepiscopeimproves neoepitope prediction with multi-variant phasing

10.1101/418129 ◽

2018 ◽

Cited By ~ 2

Author(s):

Mary A. Wood ◽

Austin Nguyen ◽

Adam Struck ◽

Kyle Ellrott ◽

Abhinav Nellore ◽

...

Keyword(s):

False Negative ◽

List Type ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Somatic Variant ◽

Prediction Tools ◽

Negative Results ◽

Multiple Datasets ◽

False Negative Results ◽

Key Points

ABSTRACTThe vast majority of tools for neoepitope prediction from DNA sequencing of complementary tumor and normal patient samples do not consider germline context or the potential for co-occurrence of two or more somatic variants on the same mRNA transcript. Without consideration of these phenomena, existing approaches are likely to produce both false positive and false negative results, resulting in an inaccurate and incomplete picture of the cancer neoepitope landscape. We developedneoepiscopechiefly to address this issue for single nucleotide variants (SNVs) and insertions/deletions (indels), and herein illustrate how germline and somatic variant phasing affects neoepitope prediction across multiple datasets. We estimate that up to ∼5% of neoepitopes arising from SNVs and indels may require variant phasing for their accurate assessment.neoepiscopeis performant, flexible, and supports several major histocompatibility complex binding affinity prediction tools. We have releasedneoepiscopeas open-source software (MIT license,https://github.com/pdxgx/neoepiscope) for broad use.KEY POINTSGermline context and somatic variant phasing are important for neoepitope predictionMany popular neoepitope prediction tools have issues of performance and reproducibilityWe describe and provide performant software for accurate neoepitope prediction from DNA-seq data

Download Full-text

Misannotation of multiple-nucleotide variants risks misdiagnosis

Wellcome Open Research ◽

10.12688/wellcomeopenres.15420.1 ◽

2019 ◽

Vol 4 ◽

pp. 145

Author(s):

Matthew N. Wakeling ◽

Thomas W. Laver ◽

Kevin Colclough ◽

Andrew Parish ◽

Sian Ellard ◽

...

Keyword(s):

Best Practices ◽

False Negative ◽

Simulated Data ◽

Sequencing Analysis ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Public Resources ◽

Next Generation Sequencing Analysis ◽

Optimal Approach

Multiple Nucleotide Variants (MNVs) are miscalled by the most widely utilised next generation sequencing analysis (NGS) pipelines, presenting the potential for missing diagnoses that would previously have been made by standard Sanger (dideoxy) sequencing. These variants, which should be treated as a single insertion-deletion mutation event, are commonly called as separate single nucleotide variants. This can result in misannotation, incorrect amino acid predictions and potentially false positive and false negative diagnostic results. This risk will be increased as confirmatory Sanger sequencing of Single Nucleotide variants (SNVs) ceases to be standard practice. Using simulated data and re-analysis of sequencing data from a diagnostic targeted gene panel, we demonstrate that the widely adopted pipeline, GATK best practices, results in miscalling of MNVs and that alternative tools can call these variants correctly. The adoption of calling methods that annotate MNVs correctly would present a solution for individual laboratories, however GATK best practices are the basis for important public resources such as the gnomAD database. We suggest integrating a solution into these guidelines would be the optimal approach.

Download Full-text

SomatoSim: precision simulation of somatic single nucleotide variants

BMC Bioinformatics ◽

10.1186/s12859-021-04024-8 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Marwan A. Hawari ◽

Celine S. Hong ◽

Leslie G. Biesecker

Keyword(s):

High Throughput Sequencing ◽

Variant Calling ◽

Simulated Data ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Somatic Variant ◽

Simulation Tools ◽

Gold Standard Dataset ◽

High Level

Abstract Background Somatic single nucleotide variants have gained increased attention because of their role in cancer development and the widespread use of high-throughput sequencing techniques. The necessity to accurately identify these variants in sequencing data has led to a proliferation of somatic variant calling tools. Additionally, the use of simulated data to assess the performance of these tools has become common practice, as there is no gold standard dataset for benchmarking performance. However, many existing somatic variant simulation tools are limited because they rely on generating entirely synthetic reads derived from a reference genome or because they do not allow for the precise customizability that would enable a more focused understanding of single nucleotide variant calling performance. Results SomatoSim is a tool that lets users simulate somatic single nucleotide variants in sequence alignment map (SAM/BAM) files with full control of the specific variant positions, number of variants, variant allele fractions, depth of coverage, read quality, and base quality, among other parameters. SomatoSim accomplishes this through a three-stage process: variant selection, where candidate positions are selected for simulation, variant simulation, where reads are selected and mutated, and variant evaluation, where SomatoSim summarizes the simulation results. Conclusions SomatoSim is a user-friendly tool that offers a high level of customizability for simulating somatic single nucleotide variants. SomatoSim is available at https://github.com/BieseckerLab/SomatoSim.

Download Full-text

Bivartect: accurate and memory-saving breakpoint detection by direct read comparison

Bioinformatics ◽

10.1093/bioinformatics/btaa059 ◽

2020 ◽

Vol 36 (9) ◽

pp. 2725-2730

Author(s):

Keisuke Shimmura ◽

Yuki Kato ◽

Yukio Kawahara

Keyword(s):

Genome Editing ◽

High Throughput Sequencing ◽

Variant Calling ◽

Simulated Data ◽

Supplementary Information ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Single Node ◽

Single Nucleotide ◽

Target Sites

Abstract Motivation Genetic variant calling with high-throughput sequencing data has been recognized as a useful tool for better understanding of disease mechanism and detection of potential off-target sites in genome editing. Since most of the variant calling algorithms rely on initial mapping onto a reference genome and tend to predict many variant candidates, variant calling remains challenging in terms of predicting variants with low false positives. Results Here we present Bivartect, a simple yet versatile variant caller based on direct comparison of short sequence reads between normal and mutated samples. Bivartect can detect not only single nucleotide variants but also insertions/deletions, inversions and their complexes. Bivartect achieves high predictive performance with an elaborate memory-saving mechanism, which allows Bivartect to run on a computer with a single node for analyzing small omics data. Tests with simulated benchmark and real genome-editing data indicate that Bivartect was comparable to state-of-the-art variant callers in positive predictive value for detection of single nucleotide variants, even though it yielded a substantially small number of candidates. These results suggest that Bivartect, a reference-free approach, will contribute to the identification of germline mutations as well as off-target sites introduced during genome editing with high accuracy. Availability and implementation Bivartect is implemented in C++ and available along with in silico simulated data at https://github.com/ykat0/bivartect. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Robust clinical detection of SARS-CoV-2 variants by RT-PCR/MALDI-TOF multi-target approach

10.1101/2021.09.09.21263348 ◽

2021 ◽

Author(s):

Matthew M. Hernandez ◽

Radhika Banu ◽

Ana S. Gonzalez-Reiche ◽

Adriana van de Guchte ◽

Zenab Khan ◽

...

Keyword(s):

New York ◽

Rapid Development ◽

False Negative ◽

Sequencing Data ◽

Rt Pcr ◽

Maldi Tof ◽

Negative Results ◽

False Negative Results ◽

Clinical Detection ◽

Target Design

The COVID-19 pandemic sparked rapid development of SARS-CoV-2 diagnostics. However, emerging variants pose the risk for target dropout and false-negative results secondary to primer/probe binding site (PBS) mismatches. The Agena MassARRAY SARS-CoV-2 Panel combines RT-PCR and MALDI-TOF mass-spectrometry to probe for five targets across N and ORF1ab genes, which provides a robust platform to accommodate PBS mismatches in divergent viruses. Herein, we utilize a deidentified dataset of 1,262 SARS-CoV-2-positive specimens from Mount Sinai Health System (New York City) from December 2020 through April 2021 to evaluate target results and corresponding sequencing data. Overall, the level of PBS mismatches was greater in specimens with target dropout. Of specimens with N3 target dropout, 57% harbored an A28095T substitution that is highly-specific for the alpha (B.1.1.7) variant of concern. These data highlight the benefit of redundancy in target design and the potential for target performance to illuminate the dynamics of circulating SARS-CoV-2 variants.

Download Full-text

VISOR: a versatile haplotype-aware structural variant simulator for short- and long-read sequencing

Bioinformatics ◽

10.1093/bioinformatics/btz719 ◽

2019 ◽

Author(s):

Davide Bolognini ◽

Ashley Sanders ◽

Jan O Korbel ◽

Alberto Magi ◽

Vladimir Benes ◽

...

Keyword(s):

Single Cell ◽

Supplementary Information ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Cancer Heterogeneity ◽

Long Reads ◽

Long Read ◽

Complex Structural ◽

Error Profiles

Abstract Summary VISOR is a tool for haplotype-specific simulations of simple and complex structural variants (SVs). The method is applicable to haploid, diploid or higher ploidy simulations for bulk or single-cell sequencing data. SVs are implanted into FASTA haplotypes at single-basepair resolution, optionally with nearby single-nucleotide variants. Short or long reads are drawn at random from these haplotypes using standard error profiles. Double- or single-stranded data can be simulated and VISOR supports the generation of haplotype-tagged BAM files. The tool further includes methods to interactively visualize simulated variants in single-stranded data. The versatility of VISOR is unmet by comparable tools and it lays the foundation to simulate haplotype-resolved cancer heterogeneity data in bulk or at single-cell resolution. Availability and implementation VISOR is implemented in python 3.6, open-source and freely available at https://github.com/davidebolo1993/VISOR. Documentation is available at https://davidebolo1993.github.io/visordoc/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

RT-PCR/MALDI-TOF diagnostic target performance reflects circulating SARS-CoV-2 variant diversity in New York City

10.1101/2021.12.04.21267265 ◽

2021 ◽

Author(s):

Matthew M. Hernandez ◽

Radhika Banu ◽

Ana S. Gonzalez-Reiche ◽

Brandon Gray ◽

Paras Shrestha ◽

...

Keyword(s):

New York ◽

False Negative ◽

Sequence Diversity ◽

Sequencing Data ◽

Negative Results ◽

Virus Diversity ◽

False Negative Results ◽

Target Performance ◽

Polymerase Chain ◽

Target Design

AbstractAs severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) continues to circulate, multiple variants of concern (VOC) have emerged. New variants pose challenges for diagnostic platforms since sequence diversity can alter primer/probe binding sites (PBS), causing false-negative results. The Agena MassARRAY® SARS-CoV-2 Panel utilizes reverse-transcription polymerase chain reaction and mass-spectrometry to detect five multiplex targets across N and ORF1ab genes. Herein, we utilize a dataset of 256 SARS-CoV-2-positive specimens collected between April 11, 2021-August 28, 2021 to evaluate target performance with paired sequencing data. During this timeframe, two targets in the N gene (N2, N3) were subject to the greatest sequence diversity. In specimens with N3 dropout, 69% harbored the Alpha-specific A28095U polymorphism that introduces a 3’-mismatch to the N3 forward PBS and increases risk of target dropout relative to specimens with 28095A (relative risk (RR): 20.02; p<0.0001; 95% Confidence Interval (CI): 11.36-35.72). Furthermore, among specimens with N2 dropout, 90% harbored the Delta-specific G28916U polymorphism that creates a 3’-mismatch to the N2 probe PBS and increases target dropout risk (RR: 11.92; p<0.0001; 95% CI: 8.17-14.06). These findings highlight the robust capability of Agena MassARRAY® SARS-CoV-2 Panel target results to reveal circulating virus diversity and underscore the power of multi-target design to capture VOC.

Download Full-text

The nitroblue tetrazolium test. An evaluation of the false-positive and false-negative results

Archives of Internal Medicine ◽

10.1001/archinte.133.3.432 ◽

1974 ◽

Vol 133 (3) ◽

pp. 432-436 ◽

Cited By ~ 1

Author(s):

M. E. Campos

Keyword(s):

False Positive ◽

False Negative ◽

Nitroblue Tetrazolium ◽

Negative Results ◽

False Negative Results ◽

Nitroblue Tetrazolium Test ◽

Tetrazolium Test

Download Full-text

False negative results of real time strain elastography in thyroid nodular disease

Ultraschall in der Medizin - European Journal of Ultrasound ◽

10.1055/s-0036-1587848 ◽

2016 ◽

Vol 37 (S 01) ◽

Author(s):

D Stoian ◽

M Craciunescu ◽

M Craina ◽

S Pantea ◽

F Varcus

Keyword(s):

Real Time ◽

False Negative ◽

Strain Elastography ◽

Negative Results ◽

False Negative Results

Download Full-text

Critical Evaluation of Impedance Phlebography for the Diagnosis of Deep Vein Thrombosis

Thrombosis and Haemostasis ◽

10.1055/s-0038-1649161 ◽

1974 ◽

Vol 31 (02) ◽

pp. 273-278

Author(s):

Kenneth K Wu ◽

John C Hoak ◽

Robert W Barnes ◽

Stuart L Frankel

Keyword(s):

Deep Vein Thrombosis ◽

False Positive ◽

False Negative ◽

Critical Evaluation ◽

Original Method ◽

Vein Thrombosis ◽

Negative Results ◽

False Negative Results ◽

Deep Vein ◽

Positive Results

SummaryIn order to evaluate its daily variability and reliability, impedance phlebography was performed daily or on alternate days on 61 patients with deep vein thrombosis, of whom 47 also had 125I-fibrinogen uptake tests and 22 had radiographic venography. The results showed that impedance phlebography was highly variable and poorly reliable. False positive results were noted in 8 limbs (18%) and false negative results in 3 limbs (7%). Despite its being simple, rapid and noninvasive, its clinical usefulness is doubtful when performed according to the original method.

Download Full-text