Using Machine Learning to Identify True Somatic Variants from Next-Generation Sequencing

Abstract BACKGROUND Molecular profiling has become essential for tumor risk stratification and treatment selection. However, cancer genome complexity and technical artifacts make identification of real variants a challenge. Currently, clinical laboratories rely on manual screening, which is costly, subjective, and not scalable. We present a machine learning–based method to distinguish artifacts from bona fide single-nucleotide variants (SNVs) detected by next-generation sequencing from nonformalin-fixed paraffin-embedded tumor specimens. METHODS A cohort of 11278 SNVs identified through clinical sequencing of tumor specimens was collected and divided into training, validation, and test sets. Each SNV was manually inspected and labeled as either real or artifact as part of clinical laboratory workflow. A 3-class (real, artifact, and uncertain) model was developed on the training set, fine-tuned with the validation set, and then evaluated on the test set. Prediction intervals reflecting the certainty of the classifications were derived during the process to label “uncertain” variants. RESULTS The optimized classifier demonstrated 100% specificity and 97% sensitivity over 5587 SNVs of the test set. Overall, 1252 of 1341 true-positive variants were identified as real, 4143 of 4246 false-positive calls were deemed artifacts, whereas only 192 (3.4%) SNVs were labeled as “uncertain,” with zero misclassification between the true positives and artifacts in the test set. CONCLUSIONS We presented a computational classifier to identify variant artifacts detected from tumor sequencing. Overall, 96.6% of the SNVs received definitive labels and thus were exempt from manual review. This framework could improve quality and efficiency of the variant review process in clinical laboratories.

Download Full-text

Using Machine Learning to Identify True Somatic Variants from Next-Generation Sequencing

10.1101/670687 ◽

2019 ◽

Author(s):

Chao Wu ◽

Xiaonan Zhao ◽

Mark Welsh ◽

Kellianne Costello ◽

Kajia Cao ◽

...

Keyword(s):

Machine Learning ◽

Clinical Laboratory ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Test Set ◽

Genome Complexity ◽

Uncertain Model ◽

Bona Fide ◽

Test Sets ◽

Validation Set

AbstractBackgroundMolecular profiling has become essential for tumor risk stratification and treatment selection. However, cancer genome complexity and technical artifacts make identification of real variants a challenge. Currently, clinical laboratories rely on manual screening, which is costly, subjective, and not scalable. Here we present a machine learning-based method to distinguish artifacts from bona fide Single Nucleotide Variants (SNVs) detected by NGS from tumor specimens.MethodsA cohort of 11,278 SNVs identified through clinical sequencing of tumor specimens were collected and divided into training, validation, and test sets. Each SNV was manually inspected and labeled as either real or artifact as part of clinical laboratory workflow. A three-class (real, artifact and uncertain) model was developed on the training set, fine-tuned using the validation set, and then evaluated on the test set. Prediction intervals reflecting the certainty of the classifications were derived during the process to label “uncertain” variants.ResultsThe optimized classifier demonstrated 100% specificity and 97% sensitivity over 5,587 SNVs of the test set. 1,252 out of 1,341 true positive variants were identified as real, 4,143 out of 4,246 false positive calls were deemed artifacts, while only 192(3.4%) SNVs were labeled as “uncertain” with zero misclassification between the true positives and artifacts in the test set.ConclusionsWe presented a computational classifier to identify variant artifacts detected from tumor sequencing. Overall, 96.6% of the SNVs received a definitive label and thus were exempt from manual review. This framework could improve quality and efficiency of variant review process in clinical labs.

Download Full-text

A Model Study of In Silico Proficiency Testing for Clinical Next-Generation Sequencing

Archives of Pathology & Laboratory Medicine ◽

10.5858/arpa.2016-0194-cp ◽

2016 ◽

Vol 140 (10) ◽

pp. 1085-1091 ◽

Cited By ~ 21

Author(s):

Eric J. Duncavage ◽

Haley J. Abel ◽

Jason D. Merker ◽

John B. Bodner ◽

Qin Zhao ◽

...

Keyword(s):

Next Generation Sequencing ◽

Proficiency Testing ◽

In Silico ◽

Absolute Difference ◽

Ion Torrent ◽

Next Generation ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Clinical Laboratories ◽

Generation Sequencing

Context.—Most current proficiency testing challenges for next-generation sequencing assays are methods-based proficiency testing surveys that use DNA from characterized reference samples to test both the wet-bench and bioinformatics/dry-bench aspects of the tests. Methods-based proficiency testing surveys are limited by the number and types of mutations that either are naturally present or can be introduced into a single DNA sample. Objective.—To address these limitations by exploring a model of in silico proficiency testing in which sequence data from a single well-characterized specimen are manipulated electronically. Design.—DNA from the College of American Pathologists reference genome was enriched using the Illumina TruSeq and Life Technologies AmpliSeq panels and sequenced on the MiSeq and Ion Torrent platforms, respectively. The resulting data were mutagenized in silico and 26 variants, including single-nucleotide variants, deletions, and dinucleotide substitutions, were added at variant allele fractions (VAFs) from 10% to 50%. Participating clinical laboratories downloaded these files and analyzed them using their clinical bioinformatics pipelines. Results.—Laboratories using the AmpliSeq/Ion Torrent and/or the TruSeq/MiSeq participated in the 2 surveys. On average, laboratories identified 24.6 of 26 variants (95%) overall and 21.4 of 22 variants (97%) with VAFs greater than 15%. No false-positive calls were reported. The most frequently missed variants were single-nucleotide variants with VAFs less than 15%. Across both challenges, reported VAF concordance was excellent, with less than 1% median absolute difference between the simulated VAF and mean reported VAF. Conclusions.—The results indicate that in silico proficiency testing is a feasible approach for methods-based proficiency testing, and demonstrate that the sensitivity and specificity of current next-generation sequencing bioinformatics across clinical laboratories are high.

Download Full-text

Analytical Validation and Performance of a 7-gene Next-generation Sequencing Panel in Uveal Melanoma

10.21203/rs.3.rs-86927/v1 ◽

2020 ◽

Author(s):

Katherina Maria Alsina ◽

Lauren M Sholl ◽

Kyle R Covington ◽

Suzette M Arnal ◽

Michael M Durante ◽

...

Keyword(s):

Next Generation Sequencing ◽

Uveal Melanoma ◽

Clinical Laboratory ◽

Limit Of Detection ◽

Gene Panel ◽

Biopsy Sample ◽

Next Generation ◽

Single Nucleotide Variants ◽

Mutational Profiling ◽

Generation Sequencing

Abstract Background: A 15-gene expression profiling (GEP) test is widely used for prognostication of metastatic risk in uveal melanoma (UM) patients. Because the amount of tumor tissue that can be safely obtained by biopsy from UM is limited, it is critical to obtain as much individualized genomic information as possible from each biopsy sample. Mutational profiling of UM tumors using next generation sequencing (NGS) in combination with GEP allows for analysis of both DNA and RNA from a single tumor sample, offers additional prognostic value, and can potentially inform therapy selection. This study evaluated the analytical performance of a targeted custom NGS panel for mutational profiling of the seven genes known to be commonly mutated in primary UM.Methods: 105 primary UM samples were analyzed, including 37 formalin-fixed paraffin embedded (FFPE) specimens and 68 fine needle aspiration biopsy (FNAB) specimens obtained with a 25- or 27-gauge needle. Sequencing was performed on the Ion GeneStudio S5 platform to an average read depth of greater than 500X per region of interest in a clinical laboratory accredited by the College of American Pathologists (CAP) and certified under the Clinical Laboratory Improvement Amendments (CLIA).Results: The 7-gene panel assay achieved a positive percent agreement (PPA) of 100% for detection of both single nucleotide variants (SNVs) and insertions/deletions (INDELs), with a technical positive predictive value (TPPV) of 99.4% and 100%, respectively. Intra-assay and inter-assay concordance studies confirmed the reproducibility and repeatability of the assay. The limit of detection was determined to be 5% variant allele frequency (VAF) for both SNVs and INDELs, with a minimum DNA input requirement of 1.5ng for FNAB and 5ng for FFPE samples.Conclusions: The 7-gene panel is a robust, highly accurate NGS test that can be successfully performed, along with GEP, from a single small gauge needle biopsy sample.

Download Full-text

CCMG practice guideline: laboratory guidelines for next-generation sequencing

Journal of Medical Genetics ◽

10.1136/jmedgenet-2019-106152 ◽

2019 ◽

Vol 56 (12) ◽

pp. 792-800 ◽

Cited By ~ 5

Author(s):

Stacey Hume ◽

Tanya N Nelson ◽

Marsha Speevak ◽

Elizabeth McCready ◽

Ron Agatep ◽

...

Keyword(s):

Next Generation Sequencing ◽

Clinical Laboratory ◽

Massively Parallel Sequencing ◽

Clinical Genetics ◽

Next Generation ◽

Clinical Genetic ◽

Inherited Disorders ◽

Laboratory Service ◽

Clinical Laboratories ◽

Generation Sequencing

PurposeThe purpose of this document is to provide guidance for the use of next-generation sequencing (NGS, also known as massively parallel sequencing or MPS) in Canadian clinical genetic laboratories for detection of genetic variants in genomic DNA and mitochondrial DNA for inherited disorders, as well as somatic variants in tumour DNA for acquired cancers. They are intended for Canadian clinical laboratories engaged in developing, validating and using NGS methods.Methods of statement developmentThe document was drafted by the Canadian College of Medical Geneticists (CCMG) Ad Hoc Working Group on NGS Guidelines to make recommendations relevant to NGS. The statement was circulated for comment to the CCMG Laboratory Practice and Clinical Practice committees, and to the CCMG membership. Following incorporation of feedback, the document was approved by the CCMG Board of Directors.DisclaimerThe CCMG is a Canadian organisation responsible for certifying medical geneticists and clinical laboratory geneticists, and for establishing professional and ethical standards for clinical genetics services in Canada. The current CCMG Practice Guidelines were developed as a resource for clinical laboratories in Canada and should not be considered to be inclusive of all information laboratories should consider in the validation and use of NGS for a clinical laboratory service.

Download Full-text

Targeted Next-Generation Sequencing for Human Leukocyte Antigen Typing in a Clinical Laboratory: Metrics of Relevance and Considerations for Its Successful Implementation

Archives of Pathology & Laboratory Medicine ◽

10.5858/arpa.2016-0537-ra ◽

2017 ◽

Vol 141 (6) ◽

pp. 806-812 ◽

Cited By ~ 19

Author(s):

Manish J. Gandhi ◽

Deborah Ferriola ◽

Yanping Huang ◽

Jamie L. Duke ◽

Dimitri Monos

Keyword(s):

Next Generation Sequencing ◽

Clinical Laboratory ◽

Human Leukocyte ◽

Hla Typing ◽

Successful Implementation ◽

Next Generation ◽

Human Leukocyte Antigen Typing ◽

Clinical Laboratories ◽

Exon 1 ◽

Generation Sequencing

Context.— Numerous feasibility studies to type human leukocyte antigens (HLAs) by next-generation sequencing (NGS) have led to the development of vendor-supported kits for HLA typing by NGS. Some clinical laboratories have introduced HLA-NGS, and many are investigating the introduction. Standards from accrediting agencies form the regulatory framework for introducing this test into clinical laboratories. Objectives.— To provide an assessment of metrics and considerations relevant to the successful implementation of clinical HLA-NGS typing, and to provide as a reference a validated HLA-NGS protocol used clinically since December 2013 at the Children's Hospital of Philadelphia (Philadelphia, Pennsylvania). Data Sources.— The HLA-NGS has been performed on 2532 samples. The initial 1046 and all homozygous samples were also typed by an alternate method. The HLA-NGS demonstrated 99.7% concordance with the alternate method. Ambiguous results were most common at the DPB1 locus because of a lack of phasing between exons 2 and 3 or the unsequenced exon 1 (533 of 2954 alleles; 18.04%) and the DRB1 locus because of not sequencing exon 1 (75 of 3972 alleles; 1.89%). No ambiguities were detected among the other loci. Except for 2 false homozygous samples, all homozygous samples (1891) demonstrated concordance with the alternate method. The article is organized to address the critical elements in the preanalytic, analytic, and postanalytic phases of introducing this assay into the clinical laboratory. Conclusions.— The results demonstrate that HLA typing by NGS is a highly accurate, reproducible, efficient method that provides more-complete sequencing information for the length of the HLA gene and can be the single methodology for HLA typing in clinical immunogenetics laboratories.

Download Full-text

Next-Generation Sequencing Informatics: Challenges and Strategies for Implementation in a Clinical Environment

Archives of Pathology & Laboratory Medicine ◽

10.5858/arpa.2015-0507-ra ◽

2016 ◽

Vol 140 (9) ◽

pp. 958-975 ◽

Cited By ~ 42

Author(s):

Somak Roy ◽

William A. LaFramboise ◽

Yuri E. Nikiforov ◽

Marina N. Nikiforova ◽

Mark J. Routbort ◽

...

Keyword(s):

Next Generation Sequencing ◽

Clinical Laboratory ◽

Cost Effective ◽

Small Degree ◽

Clinical Informatics ◽

Next Generation ◽

Clinical Environment ◽

Next Generation Sequencing Technology ◽

Clinical Laboratories ◽

Generation Sequencing

Context.—Next-generation sequencing (NGS) is revolutionizing the discipline of laboratory medicine, with a deep and direct impact on patient care. Although it empowers clinical laboratories with unprecedented genomic sequencing capability, NGS has brought along obvious and obtrusive informatics challenges. Bioinformatics and clinical informatics are separate disciplines with typically a small degree of overlap, but they have been brought together by the enthusiastic adoption of NGS in clinical laboratories. The result has been a collaborative environment for the development of novel informatics solutions. Sustaining NGS-based testing in a regulated clinical environment requires institutional support to build and maintain a practical, robust, scalable, secure, and cost-effective informatics infrastructure. Objective.—To discuss the novel NGS informatics challenges facing pathology laboratories today and offer solutions and future developments to address these obstacles. Data Sources.—The published literature pertaining to NGS informatics was reviewed. The coauthors, experts in the fields of molecular pathology, precision medicine, and pathology informatics, also contributed their experiences. Conclusions.—The boundary between bioinformatics and clinical informatics has significantly blurred with the introduction of NGS into clinical molecular laboratories. Next-generation sequencing technology and the data derived from these tests, if managed well in the clinical laboratory, will redefine the practice of medicine. In order to sustain this progress, adoption of smart computing technology will be essential. Computational pathologists will be expected to play a major role in rendering diagnostic and theranostic services by leveraging “Big Data” and modern computing tools.

Download Full-text

Point-Counterpoint: Should We Be Performing Metagenomic Next-Generation Sequencing for Infectious Disease Diagnosis in the Clinical Laboratory?

Journal of Clinical Microbiology ◽

10.1128/jcm.01739-19 ◽

2019 ◽

Vol 58 (3) ◽

Cited By ~ 10

Author(s):

Steve Miller ◽

Charles Chiu ◽

Kyle G. Rodino ◽

Melissa B. Miller

Keyword(s):

Next Generation Sequencing ◽

Clinical Laboratory ◽

Disease Diagnosis ◽

Metagenomic Sequencing ◽

Infectious Agents ◽

Next Generation ◽

Sequencing Data ◽

Clinical Laboratories ◽

Generation Sequencing ◽

Selection Of

INTRODUCTION With established applications of next-generation sequencing in inherited diseases and oncology, clinical laboratories are evaluating the use of metagenomics for identification of infectious agents directly from patient samples, to aid in the diagnosis of infections. Metagenomic next-generation sequencing for infectious diseases promises an unbiased approach to detection of microbes that does not depend on growth in culture or the targeting of specific pathogens. However, the issues of contamination, interpretation of results, selection of databases used for analysis, and prediction of antimicrobial susceptibilities from sequencing data remain challenges. In this Point-Counterpoint, Steve Miller and Charles Chiu discuss the pros of using direct metagenomic sequencing, while Kyle Rodino and Melissa Miller argue for the use of caution.

Download Full-text

A Window Into Clinical Next-Generation Sequencing–Based Oncology Testing Practices

Archives of Pathology & Laboratory Medicine ◽

10.5858/arpa.2016-0542-cp ◽

2017 ◽

Vol 141 (12) ◽

pp. 1679-1685 ◽

Cited By ~ 11

Author(s):

Rakesh Nagarajan ◽

Angela N. Bartley ◽

Julia A. Bridge ◽

Lawrence J. Jennings ◽

Suzanne Kamel-Reid ◽

...

Keyword(s):

Next Generation Sequencing ◽

Proficiency Testing ◽

Clinical Laboratory ◽

Precision Oncology ◽

Next Generation ◽

Single Nucleotide Variants ◽

Molecular Oncology ◽

Survey Results ◽

Testing Practices ◽

Generation Sequencing

Context.— Detection of acquired variants in cancer is a paradigm of precision medicine, yet little has been reported about clinical laboratory practices across a broad range of laboratories. Objective.— To use College of American Pathologists proficiency testing survey results to report on the results from surveys on next-generation sequencing–based oncology testing practices. Design.— College of American Pathologists proficiency testing survey results from more than 250 laboratories currently performing molecular oncology testing were used to determine laboratory trends in next-generation sequencing–based oncology testing. Results.— These presented data provide key information about the number of laboratories that currently offer or are planning to offer next-generation sequencing–based oncology testing. Furthermore, we present data from 60 laboratories performing next-generation sequencing–based oncology testing regarding specimen requirements and assay characteristics. The findings indicate that most laboratories are performing tumor-only targeted sequencing to detect single-nucleotide variants and small insertions and deletions, using desktop sequencers and predesigned commercial kits. Despite these trends, a diversity of approaches to testing exists. Conclusions.— This information should be useful to further inform a variety of topics, including national discussions involving clinical laboratory quality systems, regulation and oversight of next-generation sequencing–based oncology testing, and precision oncology efforts in a data-driven manner.

Download Full-text

Influence of Next-Generation Sequencing on Advancements in the Diagnosis of Major Psychiatric Diseases - A Review

Pakistan Journal of Medicine and Dentistry ◽

10.36283/pjmd9-2/022 ◽

2020 ◽

Author(s):

Maheen Nisar

Keyword(s):

Next Generation Sequencing ◽

Mental Disorders ◽

Attention Deficit Disorder ◽

Autism Spectrum ◽

Next Generation ◽

Single Nucleotide Variants ◽

New Genes ◽

Depth Analysis ◽

Pubmed Search ◽

Generation Sequencing

Rapid progress is being made in the development of next-generation sequencing (NGS) technologies, allowing repeated findings of new genes and a more in-depth analysis of genetic polymorphisms behind the pathogenesis of a disease. In a field such as psychiatry, characteristic of vague and highly variable somatic manifestations, these technologies have brought great advances towards diagnosing various psychiatric and mental disorders, identifying high-risk individuals and towards more effective corresponding treatment. Psychiatry has the difficult task of diagnosing and treating mental disorders without being able to invariably and definitively establish the properties of its illness. This calls for diagnostic technologies that go beyond the traditional ways of gene manipulation to more advanced methods mainly focusing on new gene polymorphism discoveries, one of them being NGS. This enables the identification of hundreds of common and rare genetic variations contributing to behavioral and psychological conditions. Clinical NGS has been useful to detect copy number and single nucleotide variants and to identify structural rearrangements that have been challenging for standard bioinformatics algorithms. The main objective of this article is to review the recent applications of NGS in the diagnosis of major psychiatric disorders, and hence gauge the extent of its impact in the field. A comprehensive PubMed search was conducted and papers published from 2013-2018 were included, using the keywords, “schizophrenia” or “bipolar disorder” or “depressive disorder” or “attention deficit disorder” or “autism spectrum disorder” and “next-generation sequencing”

Download Full-text