scholarly journals Using Machine Learning to Identify True Somatic Variants from Next-Generation Sequencing

2019 ◽  
Vol 66 (1) ◽  
pp. 239-246 ◽  
Author(s):  
Chao Wu ◽  
Xiaonan Zhao ◽  
Mark Welsh ◽  
Kellianne Costello ◽  
Kajia Cao ◽  
...  

Abstract BACKGROUND Molecular profiling has become essential for tumor risk stratification and treatment selection. However, cancer genome complexity and technical artifacts make identification of real variants a challenge. Currently, clinical laboratories rely on manual screening, which is costly, subjective, and not scalable. We present a machine learning–based method to distinguish artifacts from bona fide single-nucleotide variants (SNVs) detected by next-generation sequencing from nonformalin-fixed paraffin-embedded tumor specimens. METHODS A cohort of 11278 SNVs identified through clinical sequencing of tumor specimens was collected and divided into training, validation, and test sets. Each SNV was manually inspected and labeled as either real or artifact as part of clinical laboratory workflow. A 3-class (real, artifact, and uncertain) model was developed on the training set, fine-tuned with the validation set, and then evaluated on the test set. Prediction intervals reflecting the certainty of the classifications were derived during the process to label “uncertain” variants. RESULTS The optimized classifier demonstrated 100% specificity and 97% sensitivity over 5587 SNVs of the test set. Overall, 1252 of 1341 true-positive variants were identified as real, 4143 of 4246 false-positive calls were deemed artifacts, whereas only 192 (3.4%) SNVs were labeled as “uncertain,” with zero misclassification between the true positives and artifacts in the test set. CONCLUSIONS We presented a computational classifier to identify variant artifacts detected from tumor sequencing. Overall, 96.6% of the SNVs received definitive labels and thus were exempt from manual review. This framework could improve quality and efficiency of the variant review process in clinical laboratories.

2019 ◽  
Author(s):  
Chao Wu ◽  
Xiaonan Zhao ◽  
Mark Welsh ◽  
Kellianne Costello ◽  
Kajia Cao ◽  
...  

AbstractBackgroundMolecular profiling has become essential for tumor risk stratification and treatment selection. However, cancer genome complexity and technical artifacts make identification of real variants a challenge. Currently, clinical laboratories rely on manual screening, which is costly, subjective, and not scalable. Here we present a machine learning-based method to distinguish artifacts from bona fide Single Nucleotide Variants (SNVs) detected by NGS from tumor specimens.MethodsA cohort of 11,278 SNVs identified through clinical sequencing of tumor specimens were collected and divided into training, validation, and test sets. Each SNV was manually inspected and labeled as either real or artifact as part of clinical laboratory workflow. A three-class (real, artifact and uncertain) model was developed on the training set, fine-tuned using the validation set, and then evaluated on the test set. Prediction intervals reflecting the certainty of the classifications were derived during the process to label “uncertain” variants.ResultsThe optimized classifier demonstrated 100% specificity and 97% sensitivity over 5,587 SNVs of the test set. 1,252 out of 1,341 true positive variants were identified as real, 4,143 out of 4,246 false positive calls were deemed artifacts, while only 192(3.4%) SNVs were labeled as “uncertain” with zero misclassification between the true positives and artifacts in the test set.ConclusionsWe presented a computational classifier to identify variant artifacts detected from tumor sequencing. Overall, 96.6% of the SNVs received a definitive label and thus were exempt from manual review. This framework could improve quality and efficiency of variant review process in clinical labs.


2016 ◽  
Vol 140 (10) ◽  
pp. 1085-1091 ◽  
Author(s):  
Eric J. Duncavage ◽  
Haley J. Abel ◽  
Jason D. Merker ◽  
John B. Bodner ◽  
Qin Zhao ◽  
...  

Context.—Most current proficiency testing challenges for next-generation sequencing assays are methods-based proficiency testing surveys that use DNA from characterized reference samples to test both the wet-bench and bioinformatics/dry-bench aspects of the tests. Methods-based proficiency testing surveys are limited by the number and types of mutations that either are naturally present or can be introduced into a single DNA sample. Objective.—To address these limitations by exploring a model of in silico proficiency testing in which sequence data from a single well-characterized specimen are manipulated electronically. Design.—DNA from the College of American Pathologists reference genome was enriched using the Illumina TruSeq and Life Technologies AmpliSeq panels and sequenced on the MiSeq and Ion Torrent platforms, respectively. The resulting data were mutagenized in silico and 26 variants, including single-nucleotide variants, deletions, and dinucleotide substitutions, were added at variant allele fractions (VAFs) from 10% to 50%. Participating clinical laboratories downloaded these files and analyzed them using their clinical bioinformatics pipelines. Results.—Laboratories using the AmpliSeq/Ion Torrent and/or the TruSeq/MiSeq participated in the 2 surveys. On average, laboratories identified 24.6 of 26 variants (95%) overall and 21.4 of 22 variants (97%) with VAFs greater than 15%. No false-positive calls were reported. The most frequently missed variants were single-nucleotide variants with VAFs less than 15%. Across both challenges, reported VAF concordance was excellent, with less than 1% median absolute difference between the simulated VAF and mean reported VAF. Conclusions.—The results indicate that in silico proficiency testing is a feasible approach for methods-based proficiency testing, and demonstrate that the sensitivity and specificity of current next-generation sequencing bioinformatics across clinical laboratories are high.


2020 ◽  
Author(s):  
Katherina Maria Alsina ◽  
Lauren M Sholl ◽  
Kyle R Covington ◽  
Suzette M Arnal ◽  
Michael M Durante ◽  
...  

Abstract Background: A 15-gene expression profiling (GEP) test is widely used for prognostication of metastatic risk in uveal melanoma (UM) patients. Because the amount of tumor tissue that can be safely obtained by biopsy from UM is limited, it is critical to obtain as much individualized genomic information as possible from each biopsy sample. Mutational profiling of UM tumors using next generation sequencing (NGS) in combination with GEP allows for analysis of both DNA and RNA from a single tumor sample, offers additional prognostic value, and can potentially inform therapy selection. This study evaluated the analytical performance of a targeted custom NGS panel for mutational profiling of the seven genes known to be commonly mutated in primary UM.Methods: 105 primary UM samples were analyzed, including 37 formalin-fixed paraffin embedded (FFPE) specimens and 68 fine needle aspiration biopsy (FNAB) specimens obtained with a 25- or 27-gauge needle. Sequencing was performed on the Ion GeneStudio S5 platform to an average read depth of greater than 500X per region of interest in a clinical laboratory accredited by the College of American Pathologists (CAP) and certified under the Clinical Laboratory Improvement Amendments (CLIA).Results: The 7-gene panel assay achieved a positive percent agreement (PPA) of 100% for detection of both single nucleotide variants (SNVs) and insertions/deletions (INDELs), with a technical positive predictive value (TPPV) of 99.4% and 100%, respectively. Intra-assay and inter-assay concordance studies confirmed the reproducibility and repeatability of the assay. The limit of detection was determined to be 5% variant allele frequency (VAF) for both SNVs and INDELs, with a minimum DNA input requirement of 1.5ng for FNAB and 5ng for FFPE samples.Conclusions: The 7-gene panel is a robust, highly accurate NGS test that can be successfully performed, along with GEP, from a single small gauge needle biopsy sample.


2019 ◽  
Vol 56 (12) ◽  
pp. 792-800 ◽  
Author(s):  
Stacey Hume ◽  
Tanya N Nelson ◽  
Marsha Speevak ◽  
Elizabeth McCready ◽  
Ron Agatep ◽  
...  

PurposeThe purpose of this document is to provide guidance for the use of next-generation sequencing (NGS, also known as massively parallel sequencing or MPS) in Canadian clinical genetic laboratories for detection of genetic variants in genomic DNA and mitochondrial DNA for inherited disorders, as well as somatic variants in tumour DNA for acquired cancers. They are intended for Canadian clinical laboratories engaged in developing, validating and using NGS methods.Methods of statement developmentThe document was drafted by the Canadian College of Medical Geneticists (CCMG) Ad Hoc Working Group on NGS Guidelines to make recommendations relevant to NGS. The statement was circulated for comment to the CCMG Laboratory Practice and Clinical Practice committees, and to the CCMG membership. Following incorporation of feedback, the document was approved by the CCMG Board of Directors.DisclaimerThe CCMG is a Canadian organisation responsible for certifying medical geneticists and clinical laboratory geneticists, and for establishing professional and ethical standards for clinical genetics services in Canada. The current CCMG Practice Guidelines were developed as a resource for clinical laboratories in Canada and should not be considered to be inclusive of all information laboratories should consider in the validation and use of NGS for a clinical laboratory service.


2017 ◽  
Vol 141 (6) ◽  
pp. 806-812 ◽  
Author(s):  
Manish J. Gandhi ◽  
Deborah Ferriola ◽  
Yanping Huang ◽  
Jamie L. Duke ◽  
Dimitri Monos

Context.— Numerous feasibility studies to type human leukocyte antigens (HLAs) by next-generation sequencing (NGS) have led to the development of vendor-supported kits for HLA typing by NGS. Some clinical laboratories have introduced HLA-NGS, and many are investigating the introduction. Standards from accrediting agencies form the regulatory framework for introducing this test into clinical laboratories. Objectives.— To provide an assessment of metrics and considerations relevant to the successful implementation of clinical HLA-NGS typing, and to provide as a reference a validated HLA-NGS protocol used clinically since December 2013 at the Children's Hospital of Philadelphia (Philadelphia, Pennsylvania). Data Sources.— The HLA-NGS has been performed on 2532 samples. The initial 1046 and all homozygous samples were also typed by an alternate method. The HLA-NGS demonstrated 99.7% concordance with the alternate method. Ambiguous results were most common at the DPB1 locus because of a lack of phasing between exons 2 and 3 or the unsequenced exon 1 (533 of 2954 alleles; 18.04%) and the DRB1 locus because of not sequencing exon 1 (75 of 3972 alleles; 1.89%). No ambiguities were detected among the other loci. Except for 2 false homozygous samples, all homozygous samples (1891) demonstrated concordance with the alternate method. The article is organized to address the critical elements in the preanalytic, analytic, and postanalytic phases of introducing this assay into the clinical laboratory. Conclusions.— The results demonstrate that HLA typing by NGS is a highly accurate, reproducible, efficient method that provides more-complete sequencing information for the length of the HLA gene and can be the single methodology for HLA typing in clinical immunogenetics laboratories.


2016 ◽  
Vol 140 (9) ◽  
pp. 958-975 ◽  
Author(s):  
Somak Roy ◽  
William A. LaFramboise ◽  
Yuri E. Nikiforov ◽  
Marina N. Nikiforova ◽  
Mark J. Routbort ◽  
...  

Context.—Next-generation sequencing (NGS) is revolutionizing the discipline of laboratory medicine, with a deep and direct impact on patient care. Although it empowers clinical laboratories with unprecedented genomic sequencing capability, NGS has brought along obvious and obtrusive informatics challenges. Bioinformatics and clinical informatics are separate disciplines with typically a small degree of overlap, but they have been brought together by the enthusiastic adoption of NGS in clinical laboratories. The result has been a collaborative environment for the development of novel informatics solutions. Sustaining NGS-based testing in a regulated clinical environment requires institutional support to build and maintain a practical, robust, scalable, secure, and cost-effective informatics infrastructure. Objective.—To discuss the novel NGS informatics challenges facing pathology laboratories today and offer solutions and future developments to address these obstacles. Data Sources.—The published literature pertaining to NGS informatics was reviewed. The coauthors, experts in the fields of molecular pathology, precision medicine, and pathology informatics, also contributed their experiences. Conclusions.—The boundary between bioinformatics and clinical informatics has significantly blurred with the introduction of NGS into clinical molecular laboratories. Next-generation sequencing technology and the data derived from these tests, if managed well in the clinical laboratory, will redefine the practice of medicine. In order to sustain this progress, adoption of smart computing technology will be essential. Computational pathologists will be expected to play a major role in rendering diagnostic and theranostic services by leveraging “Big Data” and modern computing tools.


2019 ◽  
Vol 58 (3) ◽  
Author(s):  
Steve Miller ◽  
Charles Chiu ◽  
Kyle G. Rodino ◽  
Melissa B. Miller

INTRODUCTION With established applications of next-generation sequencing in inherited diseases and oncology, clinical laboratories are evaluating the use of metagenomics for identification of infectious agents directly from patient samples, to aid in the diagnosis of infections. Metagenomic next-generation sequencing for infectious diseases promises an unbiased approach to detection of microbes that does not depend on growth in culture or the targeting of specific pathogens. However, the issues of contamination, interpretation of results, selection of databases used for analysis, and prediction of antimicrobial susceptibilities from sequencing data remain challenges. In this Point-Counterpoint, Steve Miller and Charles Chiu discuss the pros of using direct metagenomic sequencing, while Kyle Rodino and Melissa Miller argue for the use of caution.


2017 ◽  
Vol 141 (12) ◽  
pp. 1679-1685 ◽  
Author(s):  
Rakesh Nagarajan ◽  
Angela N. Bartley ◽  
Julia A. Bridge ◽  
Lawrence J. Jennings ◽  
Suzanne Kamel-Reid ◽  
...  

Context.— Detection of acquired variants in cancer is a paradigm of precision medicine, yet little has been reported about clinical laboratory practices across a broad range of laboratories. Objective.— To use College of American Pathologists proficiency testing survey results to report on the results from surveys on next-generation sequencing–based oncology testing practices. Design.— College of American Pathologists proficiency testing survey results from more than 250 laboratories currently performing molecular oncology testing were used to determine laboratory trends in next-generation sequencing–based oncology testing. Results.— These presented data provide key information about the number of laboratories that currently offer or are planning to offer next-generation sequencing–based oncology testing. Furthermore, we present data from 60 laboratories performing next-generation sequencing–based oncology testing regarding specimen requirements and assay characteristics. The findings indicate that most laboratories are performing tumor-only targeted sequencing to detect single-nucleotide variants and small insertions and deletions, using desktop sequencers and predesigned commercial kits. Despite these trends, a diversity of approaches to testing exists. Conclusions.— This information should be useful to further inform a variety of topics, including national discussions involving clinical laboratory quality systems, regulation and oversight of next-generation sequencing–based oncology testing, and precision oncology efforts in a data-driven manner.


Author(s):  
Maheen Nisar

Rapid progress is being made in the development of next-generation sequencing (NGS) technologies, allowing repeated findings of new genes and a more in-depth analysis of genetic polymorphisms behind the pathogenesis of a disease. In a field such as psychiatry, characteristic of vague and highly variable somatic manifestations, these technologies have brought great advances towards diagnosing various psychiatric and mental disorders, identifying high-risk individuals and towards more effective corresponding treatment. Psychiatry has the difficult task of diagnosing and treating mental disorders without being able to invariably and definitively establish the properties of its illness. This calls for diagnostic technologies that go beyond the traditional ways of gene manipulation to more advanced methods mainly focusing on new gene polymorphism discoveries, one of them being NGS. This enables the identification of hundreds of common and rare genetic variations contributing to behavioral and psychological conditions. Clinical NGS has been useful to detect copy number and single nucleotide variants and to identify structural rearrangements that have been challenging for standard bioinformatics algorithms. The main objective of this article is to review the recent applications of NGS in the diagnosis of major psychiatric disorders, and hence gauge the extent of its impact in the field. A comprehensive PubMed search was conducted and papers published from 2013-2018 were included, using the keywords, “schizophrenia” or “bipolar disorder” or “depressive disorder” or “attention deficit disorder” or “autism spectrum disorder” and “next-generation sequencing”


Sign in / Sign up

Export Citation Format

Share Document