scholarly journals MvPPT: a highly efficient and sensitive pathogenicity prediction tool for missense variants

2022 ◽  
Author(s):  
Shiyuan Tong ◽  
Ke Fan ◽  
Zai-Wei Zhou ◽  
Lin-Yun Liu ◽  
Shu-Qing Zhang ◽  
...  

Next generation sequencing technologies both boost the discovery of variants in the human genome and exacerbate the challenges of pathogenic variant identification. In this study, we developed mvPPT (Pathogenicity Prediction Tool for missense variants), a highly sensitive and accurate missense variant classifier based on gradient boosting. MvPPT adopts high-confidence training sets with a wide spectrum of variant profiles, and extracts three categories of features, including scores from existing prediction tools, allele, amino acid and genotype frequencies, and genomic context. Compared with established predictors, mvPPT achieved superior performance in all test sets, regardless of data source. In addition, our study also provides guidance for training set and feature selection strategies, as well as reveals highly relevant features, which may further provide biological insights of variant pathogenicity.

2018 ◽  
Author(s):  
Perry Evans ◽  
Chao Wu ◽  
Amanda Lindy ◽  
Dianalee A. McKnight ◽  
Matthew Lebo ◽  
...  

AbstractRecent advances in DNA sequencing technologies have expanded our understanding of the molecular underpinnings for several genetic disorders, and increased the utilization of genomic tests by clinicians. Given the paucity of evidence to assess each variant, and the difficulty of experimentally evaluating a variant’s clinical significance, many of the thousand variants that can be generated by clinical tests are reported as variants of unknown clinical significance. However, the creation of population-scale variant databases can significantly improve clinical variant interpretation. Specifically, pathogenicity prediction for novel missense variants can now utilize features describing regional variant constraint. Constrained genomic regions are those that have an unusually low variant count in the general population. Several computational methods have been introduced to capture these regions and incorporate them into pathogenicity classifiers, but these methods have yet to be compared on an independent clinical variant dataset. Here we introduce one variant dataset derived from clinical sequencing panels, and use it to compare the ability of different genomic constraint metrics to determine missense variant pathogenicity. This dataset is compiled from 17,071 patients surveyed with clinical genomic sequencing for cardiomyopathy, epilepsy, or RASopathies. We further utilize this dataset to demonstrate the necessity of disease-specific classifiers, and to train PathoPredictor, a disease-specific ensemble classifier of pathogenicity based on regional constraint and variant level features. PathoPredictor achieves an average precision greater than 90% for variants from all 99 tested disease genes while approaching 100% accuracy for some genes. Accumulation of larger clinical variant datasets and their utilization to train existing pathogenicity metrics can significantly enhance their performance in a disease and gene-specific manner.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Hongjian Qi ◽  
Haicang Zhang ◽  
Yige Zhao ◽  
Chen Chen ◽  
John J. Long ◽  
...  

AbstractAccurate pathogenicity prediction of missense variants is critically important in genetic studies and clinical diagnosis. Previously published prediction methods have facilitated the interpretation of missense variants but have limited performance. Here, we describe MVP (Missense Variant Pathogenicity prediction), a new prediction method that uses deep residual network to leverage large training data sets and many correlated predictors. We train the model separately in genes that are intolerant of loss of function variants and the ones that are tolerant in order to take account of potentially different genetic effect size and mode of action. We compile cancer mutation hotspots and de novo variants from developmental disorders for benchmarking. Overall, MVP achieves better performance in prioritizing pathogenic missense variants than previous methods, especially in genes tolerant of loss of function variants. Finally, using MVP, we estimate that de novo coding variants contribute to 7.8% of isolated congenital heart disease, nearly doubling previous estimates.


2020 ◽  
Vol 22 (Supplement_2) ◽  
pp. ii203-ii203
Author(s):  
Alexander Hulsbergen ◽  
Yu Tung Lo ◽  
Vasileios Kavouridis ◽  
John Phillips ◽  
Timothy Smith ◽  
...  

Abstract INTRODUCTION Survival prediction in brain metastases (BMs) remains challenging. Current prognostic models have been created and validated almost completely with data from patients receiving radiotherapy only, leaving uncertainty about surgical patients. Therefore, the aim of this study was to build and validate a model predicting 6-month survival after BM resection using different machine learning (ML) algorithms. METHODS An institutional database of 1062 patients who underwent resection for BM was split into a 80:20 training and testing set. Seven different ML algorithms were trained and assessed for performance. Moreover, an ensemble model was created incorporating random forest, adaptive boosting, gradient boosting, and logistic regression algorithms. Five-fold cross validation was used for hyperparameter tuning. Model performance was assessed using area under the receiver-operating curve (AUC) and calibration and was compared against the diagnosis-specific graded prognostic assessment (ds-GPA); the most established prognostic model in BMs. RESULTS The ensemble model showed superior performance with an AUC of 0.81 in the hold-out test set, a calibration slope of 1.14, and a calibration intercept of -0.08, outperforming the ds-GPA (AUC 0.68). Patients were stratified into high-, medium- and low-risk groups for death at 6 months; these strata strongly predicted both 6-months and longitudinal overall survival (p < 0.001). CONCLUSIONS We developed and internally validated an ensemble ML model that accurately predicts 6-month survival after neurosurgical resection for BM, outperforms the most established model in the literature, and allows for meaningful risk stratification. Future efforts should focus on external validation of our model.


2018 ◽  
Vol 35 (16) ◽  
pp. 2757-2765 ◽  
Author(s):  
Balachandran Manavalan ◽  
Shaherin Basith ◽  
Tae Hwan Shin ◽  
Leyi Wei ◽  
Gwang Lee

AbstractMotivationCardiovascular disease is the primary cause of death globally accounting for approximately 17.7 million deaths per year. One of the stakes linked with cardiovascular diseases and other complications is hypertension. Naturally derived bioactive peptides with antihypertensive activities serve as promising alternatives to pharmaceutical drugs. So far, there is no comprehensive analysis, assessment of diverse features and implementation of various machine-learning (ML) algorithms applied for antihypertensive peptide (AHTP) model construction.ResultsIn this study, we utilized six different ML algorithms, namely, Adaboost, extremely randomized tree (ERT), gradient boosting (GB), k-nearest neighbor, random forest (RF) and support vector machine (SVM) using 51 feature descriptors derived from eight different feature encodings for the prediction of AHTPs. While ERT-based trained models performed consistently better than other algorithms regardless of various feature descriptors, we treated them as baseline predictors, whose predicted probability of AHTPs was further used as input features separately for four different ML-algorithms (ERT, GB, RF and SVM) and developed their corresponding meta-predictors using a two-step feature selection protocol. Subsequently, the integration of four meta-predictors through an ensemble learning approach improved the balanced prediction performance and model robustness on the independent dataset. Upon comparison with existing methods, mAHTPred showed superior performance with an overall improvement of approximately 6–7% in both benchmarking and independent datasets.Availability and implementationThe user-friendly online prediction tool, mAHTPred is freely accessible at http://thegleelab.org/mAHTPred.Supplementary informationSupplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
Gabrielle Wheway ◽  
Liliya Nazlamova ◽  
Nervine Meshad ◽  
Samantha Hunt ◽  
Nicola Jackson ◽  
...  

AbstractAt least six different proteins of the spliceosome, including PRPF3, PRPF4, PRPF6, PRPF8, PRPF31 and SNRNP200, are mutated in autosomal dominant retinitis pigmentosa (adRP). These proteins have recently been shown to localise to the base of the connecting cilium of the retinal photoreceptor cells, elucidating this form of RP as a retinal ciliopathy. In the case of loss-of-function variants in these genes, pathogenicity can easily be ascribed. In the case of missense variants, this is more challenging. Furthermore, the exact molecular mechanism of disease in this form of RP remains poorly understood.In this paper we take advantage of the recently published cryo EM-resolved structure of the entire human spliceosome, to predict the effect of a novel missense variant in one component of the spliceosome; PRPF31, found in a patient attending the genetics eye clinic at Bristol Eye Hospital. Monoallelic variants in PRPF31 are a common cause of autosomal dominant retinitis pigmentosa (adRP) with incomplete penetrance. We use in vitro studies to confirm pathogenicity of this novel variant PRPF31 c.341T>A, p.Ile114Asn.This work demonstrates how in silico modelling of structural effects of missense variants on cryo-EM resolved protein complexes can contribute to predicting pathogenicity of novel variants, in combination with in vitro and clinical studies. It is currently a considerable challenge to assign pathogenic status to missense variants in these proteins.


Author(s):  
Vishal Babu Siramshetty ◽  
Dac-Trung Nguyen ◽  
Natalia J. Martinez ◽  
Anton Simeonov ◽  
Noel T. Southall ◽  
...  

The rise of novel artificial intelligence methods necessitates a comparison of this wave of new approaches with classical machine learning for a typical drug discovery project. Inhibition of the potassium ion channel, whose alpha subunit is encoded by human Ether-à-go-go-Related Gene (hERG), leads to prolonged QT interval of the cardiac action potential and is a significant safety pharmacology target for the development of new medicines. Several computational approaches have been employed to develop prediction models for assessment of hERG liabilities of small molecules including recent work using deep learning methods. Here we perform a comprehensive comparison of prediction models based on classical (random forests and gradient boosting) and modern (deep neural networks and recurrent neural networks) artificial intelligence methods. The training set (~9000 compounds) was compiled by integrating hERG bioactivity data from ChEMBL database with experimental data generated from an in-house, high-throughput thallium flux assay. We utilized different molecular descriptors including the latent descriptors, which are real-valued continuous vectors derived from chemical autoencoders trained on a large chemical space (> 1.5 million compounds). The models were prospectively validated on ~840 in-house compounds screened in the same thallium flux assay. The deep neural networks performed significantly better than the classical methods with the latent descriptors. The recurrent neural networks that operate on SMILES provided highest model sensitivity. The best models were merged into a consensus model that offered superior performance compared to reference models from academic and commercial domains. Further, we shed light on the potential of artificial intelligence methods to exploit the chemistry big data and generate novel chemical representations useful in predictive modeling and tailoring new chemical space.<br>


2021 ◽  
pp. jmedgenet-2020-107462
Author(s):  
Natalie B Tan ◽  
Alistair T Pagnamenta ◽  
Matteo P Ferla ◽  
Jonathan Gadian ◽  
Brian HY Chung ◽  
...  

PurposeBinding proteins (G-proteins) mediate signalling pathways involved in diverse cellular functions and comprise Gα and Gβγ units. Human diseases have been reported for all five Gβ proteins. A de novo missense variant in GNB2 was recently reported in one individual with developmental delay/intellectual disability (DD/ID) and dysmorphism. We aim to confirm GNB2 as a neurodevelopmental disease gene, and elucidate the GNB2-associated neurodevelopmental phenotype in a patient cohort.MethodsWe discovered a GNB2 variant in the index case via exome sequencing and sought individuals with GNB2 variants via international data-sharing initiatives. In silico modelling of the variants was assessed, along with multiple lines of evidence in keeping with American College of Medical Genetics and Genomics guidelines for interpretation of sequence variants.ResultsWe identified 12 unrelated individuals with five de novo missense variants in GNB2, four of which are recurrent: p.(Ala73Thr), p.(Gly77Arg), p.(Lys89Glu) and p.(Lys89Thr). All individuals have DD/ID with variable dysmorphism and extraneurologic features. The variants are located at the universally conserved shared interface with the Gα subunit, which modelling suggests weaken this interaction.ConclusionMissense variants in GNB2 cause a congenital neurodevelopmental disorder with variable syndromic features, broadening the spectrum of multisystem phenotypes associated with variants in genes encoding G-proteins.


Materials ◽  
2020 ◽  
Vol 13 (21) ◽  
pp. 4952
Author(s):  
Mahdi S. Alajmi ◽  
Abdullah M. Almeshal

Tool wear negatively impacts the quality of workpieces produced by the drilling process. Accurate prediction of tool wear enables the operator to maintain the machine at the required level of performance. This research presents a novel hybrid machine learning approach for predicting the tool wear in a drilling process. The proposed approach is based on optimizing the extreme gradient boosting algorithm’s hyperparameters by a spiral dynamic optimization algorithm (XGBoost-SDA). Simulations were carried out on copper and cast-iron datasets with a high degree of accuracy. Further comparative analyses were performed with support vector machines (SVM) and multilayer perceptron artificial neural networks (MLP-ANN), where XGBoost-SDA showed superior performance with regard to the method. Simulations revealed that XGBoost-SDA results in the accurate prediction of flank wear in the drilling process with mean absolute error (MAE) = 4.67%, MAE = 5.32%, and coefficient of determination R2 = 0.9973 for the copper workpiece. Similarly, for the cast iron workpiece, XGBoost-SDA resulted in surface roughness predictions with MAE = 5.25%, root mean square error (RMSE) = 6.49%, and R2 = 0.975, which closely agree with the measured values. Performance comparisons between SVM, MLP-ANN, and XGBoost-SDA show that XGBoost-SDA is an effective method that can ensure high predictive accuracy about flank wear values in a drilling process.


2018 ◽  
Vol 56 (4) ◽  
pp. 220-227 ◽  
Author(s):  
Elyssa Cannaerts ◽  
Marlies Kempers ◽  
Alessandra Maugeri ◽  
Carlo Marcelis ◽  
Thatjana Gardeitchik ◽  
...  

BackgroundMissense variants in SMAD2, encoding a key transcriptional regulator of transforming growth factor beta signalling, were recently reported to cause arterial aneurysmal disease.ObjectivesThe aims of the study were to identify the genetic disease cause in families with aortic/arterial aneurysmal disease and to further define SMAD2 genotype–phenotype correlations.Methods and resultsUsing gene panel sequencing, we identified a SMAD2 nonsense variant and four SMAD2 missense variants, all affecting highly conserved amino acids in the MH2 domain. The premature stop codon (c.612dup; p.(Asn205*)) was identified in a marfanoid patient with aortic root dilatation and in his affected father. A p.(Asn318Lys) missense variant was found in a Marfan syndrome (MFS)-like case who presented with aortic root aneurysm and in her affected daughter with marfanoid features and mild aortic dilatation. In a man clinically diagnosed with Loeys-Dietz syndrome (LDS) that presents with aortic root dilatation and marked tortuosity of the neck vessels, another missense variant, p.(Ser397Tyr), was identified. This variant was also found in his affected daughter with hypertelorism and arterial tortuosity, as well as his affected mother. The third missense variant, p.(Asn361Thr), was discovered in a man presenting with coronary artery dissection. Variant genotyping in three unaffected family members confirmed its absence. The last missense variant, p.(Ser467Leu), was identified in a man with significant cardiovascular and connective tissue involvement.ConclusionTaken together, our data suggest that heterozygous loss-of-function SMAD2 variants can cause a wide spectrum of autosomal dominant aortic and arterial aneurysmal disease, combined with connective tissue findings reminiscent of MFS and LDS.


Cancers ◽  
2019 ◽  
Vol 11 (4) ◽  
pp. 522 ◽  
Author(s):  
Volha A. Golubeva ◽  
Thales C. Nepomuceno ◽  
Alvaro N. A. Monteiro

Genetic testing allows for the identification of germline DNA variations, which are associated with a significant increase in the risk of developing breast cancer (BC) and ovarian cancer (OC). Detection of a BRCA1 or BRCA2 pathogenic variant triggers several clinical management actions, which may include increased surveillance and prophylactic surgery for healthy carriers or treatment with the PARP inhibitor therapy for carriers diagnosed with cancer. Thus, standardized validated criteria for the annotation of BRCA1 and BRCA2 variants according to their pathogenicity are necessary to support clinical decision-making and ensure improved outcomes. Upon detection, variants whose pathogenicity can be inferred by the genetic code are typically classified as pathogenic, likely pathogenic, likely benign, or benign. Variants whose impact on function cannot be directly inferred by the genetic code are labeled as variants of uncertain clinical significance (VUS) and are evaluated by multifactorial likelihood models that use personal and family history of cancer, segregation data, prediction tools, and co-occurrence with a pathogenic BRCA variant. Missense variants, coding alterations that replace a single amino acid residue with another, are a class of variants for which determination of clinical relevance is particularly challenging. Here, we discuss current issues in the missense variant classification by following a typical life cycle of a BRCA1 missense variant through detection, annotation and information dissemination. Advances in massively parallel sequencing have led to a substantial increase in VUS findings. Although the comprehensive assessment and classification of missense variants according to their pathogenicity remains the bottleneck, new developments in functional analysis, high throughput assays, data sharing, and statistical models are rapidly changing this scenario.


Sign in / Sign up

Export Citation Format

Share Document