scholarly journals Machine Learning Models for Accurate Prioritization of Variants of Uncertain Significance

Author(s):  
Daniel Mahecha ◽  
Haydemar Nuñez ◽  
Maria Lattig ◽  
Jorge Duitama

The growing use of new generation sequencing technologies on genetic diagnosis has produced an exponential increase in the number of Variants of Uncertain Significance (VUS). In this manuscript we compare three machine learning methods to classify VUS as Pathogenic or No pathogenic, implementing a Random Forest (RF), a Support Vector Machine (SVM), and a Multilayer Perceptron (MLP). To train the models, we extracted 82,463 high quality variants from ClinVar, using 9 conservation scores, the loss of function tool and allele frequencies. For the RF and SVM models, hyperparameters were tuned using cross validation with a grid search. The three models were tested on a set of 5,537 variants that had been classified as VUS any time along the last three years but had been reclassified in august 2020. The three models yielded superior accuracy on this set compared to the benchmarked tools. The RF based model yielded the best performance across different variant types and was used to create VusPrize, an open source software tool for prioritization of variants of uncertain significance. We believe that our model can improve the process of genetic diagnosis on research and clinical settings.

2014 ◽  
Vol 1051 ◽  
pp. 1009-1015 ◽  
Author(s):  
Ya Li Ning ◽  
Xin You Wang ◽  
Xi Ping He

Support Vector Machines (SVM), which is a new generation learning method based on advances in statistical learning theory, is characterized by the use of many standard technologies of machine learning such as maximal margin hyperplane, Mercel kernels and the quadratic programming. Because the best performance is obtained in many currently challenging applications, SVM has sustained wide attention, and has been become the standard tools of machine learning and data mining. But as a developing technology, SVM still have some problems and its applications are limited. In this paper, SVM and its applications in chaotic time series including predicting chaotic time series, focus on comparison in regression type selection, and kernel type selection in the same regression machine type.


Biomedicines ◽  
2022 ◽  
Vol 10 (1) ◽  
pp. 106
Author(s):  
Estefanía Martínez-Barrios ◽  
Sergi Cesar ◽  
José Cruzalegui ◽  
Clara Hernandez ◽  
Elena Arbelo ◽  
...  

Sudden death is a rare event in the pediatric population but with a social shock due to its presentation as the first symptom in previously healthy children. Comprehensive autopsy in pediatric cases identify an inconclusive cause in 40–50% of cases. In such cases, a diagnosis of sudden arrhythmic death syndrome is suggested as the main potential cause of death. Molecular autopsy identifies nearly 30% of cases under 16 years of age carrying a pathogenic/potentially pathogenic alteration in genes associated with any inherited arrhythmogenic disease. In the last few years, despite the increasing rate of post-mortem genetic diagnosis, many families still remain without a conclusive genetic cause of the unexpected death. Current challenges in genetic diagnosis are the establishment of a correct genotype–phenotype association between genes and inherited arrhythmogenic disease, as well as the classification of variants of uncertain significance. In this review, we provide an update on the state of the art in the genetic diagnosis of inherited arrhythmogenic disease in the pediatric population. We focus on emerging publications on gene curation for genotype–phenotype associations, cases of genetic overlap and advances in the classification of variants of uncertain significance. Our goal is to facilitate the translation of genetic diagnosis to the clinical area, helping risk stratification, treatment and the genetic counselling of families.


2021 ◽  
Author(s):  
Karen I Lange ◽  
Sunayna Best ◽  
Sofia Tsiropoulou ◽  
Ian Berry ◽  
Colin A Johnson ◽  
...  

Purpose: A molecular genetic diagnosis is essential for accurate counselling and management of patients with ciliopathies. Uncharacterized missense alleles are often classified as variants of uncertain significance (VUS) and are not clinically useful. In this study, we explore the use of a tractable animal model (C. elegans) for in vivo interpretation of missense VUS alleles of TMEM67, a gene frequently mutated as a cause of ciliopathies. Methods: CRISPR/Cas9 gene editing was used to generate homozygous worm strains carrying TMEM67 patient variants. Quantitative phenotypic assays (dye filling, roaming, chemotaxis) assessed cilia structure and function. Results were validated by genetic complementation assays in a human TMEM67 knock-out hTERT-RPE1 cell line. Results: Quantitative assays in C. elegans distinguished between known benign (Asp359Glu, Thr360Ala) and pathogenic (Glu361Ter, Gln376Pro) variants. Analysis of seven missense VUS alleles predicted two benign (Cys173Arg, Thr176Ile) and four pathogenic variants (Cys170Tyr, His782Arg, Gly786Glu, His790Arg). Results from one VUS (Gly979Arg) were inconclusive in worms, but additional in vitro validation suggested it was likely benign. Conclusion: Efficient genome editing and quantitative functional assays in C. elegans make it a tractable in vivo animal model that allows stratification and rapid, cost-effective interpretation of ciliopathy-associated missense VUS alleles.


2021 ◽  
Vol 22 (16) ◽  
pp. 8627
Author(s):  
Jane H. Frederiksen ◽  
Sara B. Jensen ◽  
Zeynep Tümer ◽  
Thomas v. O. Hansen

Lynch syndrome (LS) is one of the most common hereditary cancer predisposition syndromes worldwide. Individuals with LS have a high risk of developing colorectal or endometrial cancer, as well as several other cancers. LS is caused by autosomal dominant pathogenic variants in one of the DNA mismatch repair (MMR) genes MLH1, MSH2, PMS2 or MSH6, and typically include truncating variants, such as frameshift, nonsense or splicing variants. However, a significant number of missense, intronic, or silent variants, or small in-frame insertions/deletions, are detected during genetic screening of the MMR genes. The clinical effects of these variants are often more difficult to predict, and a large fraction of these variants are classified as variants of uncertain significance (VUS). It is pivotal for the clinical management of LS patients to have a clear genetic diagnosis, since patients benefit widely from screening, preventive and personal therapeutic measures. Moreover, in families where a pathogenic variant is identified, testing can be offered to family members, where non-carriers can be spared frequent surveillance, while carriers can be included in cancer surveillance programs. It is therefore important to reclassify VUSs, and, in this regard, functional assays can provide insight into the effect of a variant on the protein or mRNA level. Here, we briefly describe the disorders that are related to MMR deficiency, as well as the structure and function of MSH6. Moreover, we review the functional assays that are used to examine VUS identified in MSH6 and discuss the results obtained in relation to the ACMG/AMP PS3/BS3 criterion. We also provide a compiled list of the MSH6 variants examined by these assays. Finally, we provide a future perspective on high-throughput functional analyses with specific emphasis on the MMR genes.


2020 ◽  
Author(s):  
Yuqian Zhang ◽  
He Wang ◽  
Yifei Yao ◽  
Jianren Liu ◽  
Xuhong Sun ◽  
...  

Abstract Background: Benign paroxysmal positional vertigo (BPPV) is one of the most common peripheral vestibular disorders leading to balance difficulties and increased fall risks. This study aims to investigate the walking stability of BPPV patients in clinical settings and propose a machine-learning-based classification method for determining the severity of gait disturbances of BPPV. Methods: Twenty-seven BPPV outpatients and twenty-seven healthy subjects completed level walking trials at self-preferred speed in clinical settings while wearing one accelerometer on the head and one on the lower trunk. Temporo-spatial variables and six walking stability related variables (root mean square (RMS), harmonic ratio (HR), gait variability, step/stride regularity, and gait symmetry) derived from the acceleration signals were analyzed. A support vector machine model (SVM) based on the gait variables of BPPV patients were developed to classify the BPPV severity of gait disturbances. Results: The results showed that BPPV patients employed a conservative gait and significantly reduced walking stability compared to the healthy controls. Significant different mediolateral HR at the lower trunk and anteroposterior step regularity at the head were found in BPPV patients among mild, moderate, and severe DHI (dizziness handicap inventory) subgroups. SVM classification achieved promising accuracies with area under the curve (AUC) = 0.87, 0.80, and 0.95 respectively for classifying the three stages of DHI subgroups. Conclusions: Results suggested that the proposed gait analysis that is based on the coupling of wearable accelerometers and machine learning provides an objective approach for assessing gait disturbances and handicapping effects of dizziness imposed by BBPV.Trial registration: The trial was registered in the Chinese Clinical Trial Registry (http://www.chictr.org.cn) on March 29, 2018. Registration number: ChiCTR1800015432 (http://www.chictr.org.cn/showproj.aspx?proj=25587).


2019 ◽  
Vol 26 (6) ◽  
pp. 561-576 ◽  
Author(s):  
Zhijun Yin ◽  
Lina M Sulieman ◽  
Bradley A Malin

Abstract Objective User-generated content (UGC) in online environments provides opportunities to learn an individual’s health status outside of clinical settings. However, the nature of UGC brings challenges in both data collecting and processing. The purpose of this study is to systematically review the effectiveness of applying machine learning (ML) methodologies to UGC for personal health investigations. Materials and Methods We searched PubMed, Web of Science, IEEE Library, ACM library, AAAI library, and the ACL anthology. We focused on research articles that were published in English and in peer-reviewed journals or conference proceedings between 2010 and 2018. Publications that applied ML to UGC with a focus on personal health were identified for further systematic review. Results We identified 103 eligible studies which we summarized with respect to 5 research categories, 3 data collection strategies, 3 gold standard dataset creation methods, and 4 types of features applied in ML models. Popular off-the-shelf ML models were logistic regression (n = 22), support vector machines (n = 18), naive Bayes (n = 17), ensemble learning (n = 12), and deep learning (n = 11). The most investigated problems were mental health (n = 39) and cancer (n = 15). Common health-related aspects extracted from UGC were treatment experience, sentiments and emotions, coping strategies, and social support. Conclusions The systematic review indicated that ML can be effectively applied to UGC in facilitating the description and inference of personal health. Future research needs to focus on mitigating bias introduced when building study cohorts, creating features from free text, improving clinical creditability of UGC, and model interpretability.


2021 ◽  
Author(s):  
Jeffrey D Calhoun ◽  
Miriam C Aziz ◽  
Hannah C Happ ◽  
Jonathan Gunti ◽  
Colleen Gleason ◽  
...  

AbstractBiallelic pathogenic variants in SZT2 result in a neurodevelopmental disorder with shared features, including early-onset epilepsy, developmental delay, macrocephaly, and corpus callosum abnormalities. SZT2 is as a critical scaffolding protein in the amino acid sensing arm of the mTOR signaling pathway. Due to its large size (3432 amino acids), lack of crystal structure, and absence of functional domains, it is difficult to determine the pathogenicity of SZT2 missense and in-frame deletions. We report a cohort of twelve individuals with biallelic SZT2 variants and phenotypes consistent with SZT2-related neurodevelopmental disorder. The majority of this cohort contained one or more SZT2 variants of uncertain significance (VUS). We developed a novel individualized platform to functionally characterize SZT2 VUSs. We identified a recurrent in-frame deletion (SZT2 p.Val1984del) which was determined to be a loss-of-function variant and therefore likely pathogenic. Haplotype analysis determined this single in-frame deletion is a founder variant in those of Ashkenazi Jewish ancestry. Overall, we present a FACS-based rapid assay to distinguish pathogenic variants from VUSs in SZT2, using an approach that is widely applicable to other mTORopathies including the most common causes of the focal genetic epilepsies, DEPDC5, TSC1/2, MTOR and NPRL2/3.


2021 ◽  
Author(s):  
Kathryn McCormick ◽  
Trisha Brock ◽  
Matthew Wood ◽  
Lan Guo ◽  
Kolt McBride ◽  
...  

Purpose: Functional evidence is a pillar of variant interpretation according to ACMG guidelines. Functional evidence can be obtained in a variety of models and assay systems, including patient-derived tissues and iPSCs, in vitro cellular assays, and in vivo assays. Here we evaluate the reliability and practicality of variant interpretation in the small animal model, C. elegans, through a series of experiments evaluating the function of syntaxin binding protein, STXBP1, a well-known causative gene for Early infantile epileptic encephalopathy 1 (EIEE1). Methods: Using CRISPR, we replaced the coding sequence for unc-18 with the coding sequence for the human ortholog STXBP1. Next, we used CRISPR to introduce precise point mutations in the human STXBP1 coding sequence, reflecting three clinical categories (benign, pathogenic, and variants of uncertain significance (VUS)). We quantified 26 features of the resulting worms movement to train Random Forest (RF) and Support Vector Machines (SVM) machine learning classifiers on known pathogenic and benign variants. We characterized the classifiers, and then used the behavioral data from the VUS-expressing animals to predict the categorization of the VUS. Results: Whereas knock-out worms without unc-18 are severely impaired in motor function, worms expressing STXBP1 in its place have restored motor function. We produced worms with STXBP1 variants previously classified by ACMG criteria, including 25 benign variants, 32 pathogenic, and 24 variants of uncertain significance (VUS). Using either SVM or RF classifiers, we were able to obtain a sensitivity of 0.84-0.97 on known benign and pathogenic strains. By comparing multiple ML classification methods, we were able to classify 9 of the VUS as functionally abnormal, suggesting that these VUS are likely to be pathogenic. Conclusions: We demonstrate that automated analysis of a small animal system is an effective, scalable, and fast way to understand functional consequences of variants in STXBP1, one of the most common causes of genetic epilepsies and neurodevelopmental disorders. Keywords: STXBP1, C. elegans, CRISPR, Unc-18


Energies ◽  
2021 ◽  
Vol 14 (23) ◽  
pp. 7949
Author(s):  
Michele Zanoni ◽  
Riccardo Chiumeo ◽  
Liliana Tenti ◽  
Massimo Volta

This paper presents the integration of advanced machine learning techniques in the medium voltage distributed monitoring system QuEEN. This system is aimed to monitor voltage dips in the Italian distribution network mainly for survey and research purposes. For each recorded event it is able to automatically evaluate its residual voltage and duration from the corresponding voltage rms values and provide its “validity” (invalidating any false events caused by voltage transformers saturation) and its “origin”(upstream or downstream from the measurement point) by proper procedures and algorithms (current techniques). On the other hand, in the last years new solutions have been proposed by RSE to improve the assessment of the validity and origin of the event: the DELFI classifier (DEep Learning for False voltage dips Identification) and the FExWaveS + SVM classifier (Features Extraction from Waveform Segmentation + Support Vector Machine classifier). These advanced functionalities have been recently integrated in the monitoring system thanks to the automated software tool called QuEEN PyService. In this work, intensive use of these advanced techniques has been carried out for the first time on a significant number of monitored sites (150) starting from the data recorded from 2018 to 2021. Besides, the comparison between the results of the innovative technique (validity and origin of severe voltage dips) with respect to the current ones has been performed at the macro-regional level too. The new techniques are shown to have a not negligible impact on the severe voltage dips number and confirm a non-homogenous condition among the Italian macro-regional areas.


Sign in / Sign up

Export Citation Format

Share Document