Use of Targeted High-throughput Sequencing for Genetic Classification of Patients with Bleeding Diathesis and Suspected Platelet Disorder

2019 ◽  
Author(s):  
O. Andres ◽  
E.-M. König ◽  
E. Klopocki ◽  
H. Schulze ◽  
TH Open ◽  
2018 ◽  
Vol 02 (04) ◽  
pp. e445-e454 ◽  
Author(s):  
Oliver Andres ◽  
Eva-Maria König ◽  
Karina Althaus ◽  
Tamam Bakchoul ◽  
Peter Bugert ◽  
...  

AbstractInherited platelet disorders (IPD) form a rare and heterogeneous disease entity that is present in about 8% of patients with non-acquired bleeding diathesis. Identification of the defective cellular pathway is an important criterion for stratifying the patient's individual risk profile and for choosing personalized therapeutic options. While costs of high-throughput sequencing technologies have rapidly declined over the last decade, molecular genetic diagnosis of bleeding and platelet disorders is getting more and more suitable within the diagnostic algorithms. In this study, we developed, verified, and evaluated a targeted, panel-based next-generation sequencing approach comprising 59 genes associated with IPD for a cohort of 38 patients with a history of recurrent bleeding episodes and functionally suspected, but so far genetically undefined IPD. DNA samples from five patients with genetically defined IPD with disease-causing variants in WAS, RBM8A, FERMT3, P2YR12, and MYH9 served as controls during the validation process. In 40% of 35 patients analyzed, we were able to finally detect 15 variants, eight of which were novel, in 11 genes, ACTN1, AP3B1, GFI1B, HPS1, HPS4, HPS6, MPL, MYH9, TBXA2R, TPM4, and TUBB1, and classified them according to current guidelines. Apart from seven variants of uncertain significance in 11% of patients, nine variants were classified as likely pathogenic or pathogenic providing a molecular diagnosis for 26% of patients. This report also emphasizes on potentials and pitfalls of this tool and prospectively proposes its rational implementation within the diagnostic algorithms of IPD.


Author(s):  
М.А. Спектор ◽  
Л.А. Ясько ◽  
А.Е. Друй

Активное внедрение высокопроизводительного секвенирования в клиническую практику требует общего подхода к интерпретации обнаруженных генетических вариантов, в частности, вариантов с соматическим статусом. В 2017 году Ассоциация молекулярной патологии США (AMP), Американская коллегия медицинской генетики и геномики (ACMG), Американское общество клинической онкологии (ASCO) и Коллегия американских патологов (CAP) опубликовали руководство по интерпретации соматических генетических вариантов и выдаче заключений по результатам высокопроизводительного секвенирования опухолевой ДНК. Данный обзор посвящен специфике применения руководства AMP/ACMG/ASCO/CAP для интерпретации результатов генетических исследований детских солидных опухолей. В статье приводятся критерии, на которых основана классификация соматических генетических вариантов, обсуждаются проблемы оценки клинической значимости генетических находок и приводятся примеры классификации генетических вариантов, выявленных в различных типах детских солидных опухолей. Active clinical implementation of high-throughput DNA sequencing requires a common approach to the interpretation of detected genetic variants, including variants with somatic status. In 2017, the United States Association of Molecular Pathology (AMP), the American College of Medical Genetics and Genomics (ACMG), the American Society of Clinical Oncology (ASCO), and the College of American Pathologists (CAP) published the guidelines for interpreting and reporting the somatic genetic variants in cancer identified using high-throughput sequencing analysis. This review focuses on the specific application of the AMP/ACMG/ASCO/CAP guidelines in the field of genetic research on paediatric solid tumors. In particular, the review provides the criteria for classification of somatic genetic variants, discusses the problems of evaluating the clinical significance of genetic findings in paediatric tumors, and provides examples of classification of genetic variants specific for certain types of childhood solid malignancies.


mSystems ◽  
2020 ◽  
Vol 5 (2) ◽  
Author(s):  
Thomas P. Quinn ◽  
Ionas Erb

ABSTRACT Since the turn of the century, technological advances have made it possible to obtain the molecular profile of any tissue in a cost-effective manner. Among these advances are sophisticated high-throughput assays that measure the relative abundances of microorganisms, RNA molecules, and metabolites. While these data are most often collected to gain new insights into biological systems, they can also be used as biomarkers to create clinically useful diagnostic classifiers. How best to classify high-dimensional -omics data remains an area of active research. However, few explicitly model the relative nature of these data and instead rely on cumbersome normalizations. This report (i) emphasizes the relative nature of health biomarkers, (ii) discusses the literature surrounding the classification of relative data, and (iii) benchmarks how different transformations perform for regularized logistic regression across multiple biomarker types. We show how an interpretable set of log contrasts, called balances, can prepare data for classification. We propose a simple procedure, called discriminative balance analysis, to select groups of 2 and 3 bacteria that can together discriminate between experimental conditions. Discriminative balance analysis is a fast, accurate, and interpretable alternative to data normalization. IMPORTANCE High-throughput sequencing provides an easy and cost-effective way to measure the relative abundance of bacteria in any environmental or biological sample. When these samples come from humans, the microbiome signatures can act as biomarkers for disease prediction. However, because bacterial abundance is measured as a composition, the data have unique properties that make conventional analyses inappropriate. To overcome this, analysts often use cumbersome normalizations. This article proposes an alternative method that identifies pairs and trios of bacteria whose stoichiometric presence can differentiate between diseased and nondiseased samples. By using interpretable log contrasts called balances, we developed an entirely normalization-free classification procedure that reduces the feature space and improves the interpretability, without sacrificing classifier performance.


2017 ◽  
Author(s):  
André Corvelo ◽  
Wayne E. Clarke ◽  
Nicolas Robine ◽  
Michael C. Zody

AbstractHigh-throughput sequencing is a revolutionary technology for the analysis of metagenomic samples. However, querying large volumes of reads against comprehensive DNA/RNA databases in a sensitive manner can be compute-intensive. Here, we present taxMaps, a highly efficient, sensitive and fully scalable taxonomic classification tool, capable of delivering classification accuracy comparable to that of BLASTn, but at up to 3 orders of magnitude less computational cost. taxMaps is freely available for academic and non-commercial research purposes at https://github.com/nygenome/taxmaps.


2020 ◽  
Author(s):  
Md. Nafis Ul Alam ◽  
Umar Faruq Chowdhury

AbstractHigh throughout sequencing technologies have greatly enabled the study of genomics, transcriptomics and metagenomics. Automated annotation and classification of the vast amounts of generated sequence data has become paramount for facilitating biological sciences. Genomes of viruses can be radically different from all life, both in terms of molecular structure and primary sequence. Alignment-based and profile-based searches are commonly employed for characterization of assembled viral contigs from high-throughput sequencing data. Recent attempts have highlighted the use of machine learning models for the task but these models rely entirely on DNA genomes and owing to the intrinsic genomic complexity of viruses, RNA viruses have gone completely overlooked. Here, we present a novel short k-mer based sequence scoring method that generates robust sequence information for training machine learning classifiers. We trained 18 classifiers for the task of distinguishing viral RNA from human transcripts. We challenged our models with very stringent testing protocols across different species and evaluated performance against BLASTn, BLASTx and HMMER3 searches. For clean sequence data retrieved from curated databases, our models display near perfect accuracy, outperforming all similar attempts previously reported. On de-novo assemblies of raw RNA-Seq data from cells subjected to Ebola virus, the area under the ROC curve varied from 0.6 to 0.86 depending on the software used for assembly. Our classifier was able to properly classify the majority of the false hits generated by BLAST and HMMER3 searches on the same data. The outstanding performance metrics of our model lays the groundwork for robust machine learning methods for the automated annotation of sequence data.Author SummaryIn this age of high-throughput sequencing, proper classification of copious amounts of sequence data remains to be a daunting challenge. Presently, sequence alignment methods are immediately assigned to the task. Owing to the selection forces of nature, there is considerable homology even between the sequences of different species which draws ambiguity to the results of alignment-based searches. Machine Learning methods are becoming more reliable for characterizing sequence data, but virus genomes are more variable than all forms of life and viruses with RNA-based genomes have gone overlooked in previous machine learning attempts. We designed a novel short k-mer based scoring criteria whereby a large number of highly robust numerical feature sets can be derived from sequence data. These features were able to accurately distinguish virus RNA from human transcripts with performance scores better than all previous reports. Our models were able to generalize well to distant species of viruses and mouse transcripts. The model correctly classifies the majority of false hits generated by current standard alignment tools. These findings strongly imply that this k-mer score based computational pipeline forges a highly informative, rich set of numerical machine learning features and similar pipelines can greatly advance the field of computational biology.


Sign in / Sign up

Export Citation Format

Share Document