scholarly journals MetaRNN: Differentiating Rare Pathogenic and Rare Benign Missense SNVs and InDels Using Deep Learning

2021 ◽  
Author(s):  
Chang Li ◽  
Degui Zhi ◽  
Kai Wang ◽  
Xiaoming Liu

We present the pathogenicity prediction models MetaRNN and MetaRNN-indel to help identify and prioritize rare nonsynonymous single nucleotide variants (nsSNVs) and non-frameshift insertion/deletions (nfINDELs) using deep learning and context annotations. Employing independent test datasets, we demonstrate that these new models outperform state-of-the-art competitors and achieve a more interpretable score distribution. MetaRNN executables and precomputed scores are available at http://www.liulab.science/MetaRNN.

2020 ◽  
Author(s):  
Prashant Gupta ◽  
Aashi Jindal ◽  
Jayadeva ◽  
Debarka Sengupta

ABSTRACTThe exclusivity of a vast majority of cancer mutations remains poorly understood, despite the availability of large amounts of whole genome and exome sequencing data. In clinical settings, this markedly hinders the identification of the previously uncharacterized deleterious mutations due to the unavailability of matched normal samples. We employed state of the art deep learning algorithms for cross-exome learning of mutational embeddings and demonstrated their utility in sequence based detection of cancer-specific Single Nucleotide Variants (SNVs).


Author(s):  
Kexin Huang ◽  
Tianfan Fu ◽  
Lucas M Glass ◽  
Marinka Zitnik ◽  
Cao Xiao ◽  
...  

Abstract Summary Accurate prediction of drug–target interactions (DTI) is crucial for drug discovery. Recently, deep learning (DL) models for show promising performance for DTI prediction. However, these models can be difficult to use for both computer scientists entering the biomedical field and bioinformaticians with limited DL experience. We present DeepPurpose, a comprehensive and easy-to-use DL library for DTI prediction. DeepPurpose supports training of customized DTI prediction models by implementing 15 compound and protein encoders and over 50 neural architectures, along with providing many other useful features. We demonstrate state-of-the-art performance of DeepPurpose on several benchmark datasets. Availability and implementation https://github.com/kexinhuang12345/DeepPurpose. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 34 (08) ◽  
pp. 13294-13299
Author(s):  
Hangzhi Guo ◽  
Alexander Woodruff ◽  
Amulya Yadav

Farmer suicides have become an urgent social problem which governments around the world are trying hard to solve. Most farmers are driven to suicide due to an inability to sell their produce at desired profit levels, which is caused by the widespread uncertainty/fluctuation in produce prices resulting from varying market conditions. To prevent farmer suicides, this paper takes the first step towards resolving the issue of produce price uncertainty by presenting PECAD, a deep learning algorithm for accurate prediction of future produce prices based on past pricing and volume patterns. While previous work presents machine learning algorithms for prediction of produce prices, they suffer from two limitations: (i) they do not explicitly consider the spatio-temporal dependence of future prices on past data; and as a result, (ii) they rely on classical ML prediction models which often perform poorly when applied to spatio-temporal datasets. PECAD addresses these limitations via three major contributions: (i) we gather real-world daily price and (produced) volume data of different crops over a period of 11 years from an official Indian government administered website; (ii) we pre-process this raw dataset via state-of-the-art imputation techniques to account for missing data entries; and (iii) PECAD proposes a novel wide and deep neural network architecture which consists of two separate convolutional neural network models (trained for pricing and volume data respectively). Our simulation results show that PECAD outperforms existing state-of-the-art baseline methods by achieving significantly lesser root mean squared error (RMSE) - PECAD achieves ∼25% lesser coefficient of variance than state-of-the-art baselines. Our work is done in collaboration with a non-profit agency that works on preventing farmer suicides in the Indian state of Jharkhand, and PECAD is currently being reviewed by them for potential deployment.


2016 ◽  
Author(s):  
Feng Liu ◽  
Hao Li ◽  
Chao Ren ◽  
Xiaochen Bo ◽  
Wenjie Shu

AbstractTranscriptional enhancers are non-coding segments of DNA that play a central role in the spatiotemporal regulation of gene expression programs. However, systematically and precisely predicting enhancers remain a major challenge. Although existing methods have achieved some success in enhancer prediction, they still suffer from many issues. We developed a deep learning-based algorithmic framework named PEDLA (https://github.com/wenjiegroup/PEDLA), which can directly learn an enhancer predictor from massively heterogeneous data and generalize in ways that are mostly consistent across various cell types/tissues. We first trained PEDLA with 1,114-dimensional heterogeneous features in H1 cells, and we demonstrated that our PEDLA framework integrates diverse heterogeneous features and gives state-of-the-art performance relative to five existing methods for enhancer prediction. We further extended PEDLA to iteratively learn from 22 training cell types/tissues. Our results showed that PEDLA manifested superior performance consistency in both training and independent test sets. On average, PEDLA achieved 95.0% accuracy and a 96.8% geometric mean (GM) across 22 training cell types/tissues, as well as 95.7% accuracy and a 96.8% GM across 20 independent test cell types/tissues. Together, our work illustrates the power of harnessing state-of-the-art deep learning techniques to consistently identify regulatory elements at a genome-wide scale from massively heterogeneous data across diverse cell types/tissues.


2020 ◽  
Author(s):  
Yanhua Gao ◽  
Yuan Zhu ◽  
Bo Liu ◽  
Yue Hu ◽  
Youmin Guo

ObjectiveIn Transthoracic echocardiographic (TTE) examination, it is essential to identify the cardiac views accurately. Computer-aided recognition is expected to improve the accuracy of the TTE examination.MethodsThis paper proposes a new method for automatic recognition of cardiac views based on deep learning, including three strategies. First, A spatial transform network is performed to learn cardiac shape changes during the cardiac cycle, which reduces intra-class variability. Second, a channel attention mechanism is introduced to adaptively recalibrates channel-wise feature responses. Finally, unlike conventional deep learning methods, which learned each input images individually, the structured signals are applied by a graph of similarities among images. These signals are transformed into the graph-based image embedding, which act as unsupervised regularization constraints to improve the generalization accuracy.ResultsThe proposed method was trained and tested in 171792 cardiac images from 584 subjects. Compared with the known result of the state of the art, the overall accuracy of the proposed method on cardiac image classification is 99.10% vs. 91.7%, and the mean AUC is 99.36%. Moreover, the overall accuracy is 98.15%, and the mean AUC is 98.96% on an independent test set with 34211 images from 100 subjects.ConclusionThe method of this paper achieved the results of the state of the art, which is expected to be an automated recognition tool for cardiac views recognition. The work confirms the potential of deep learning on ultrasound medicine.


Author(s):  
Himanshu Gupta ◽  
Hirdesh Varshney ◽  
Tarun Kumar Sharma ◽  
Nikhil Pachauri ◽  
Om Prakash Verma

Abstract Background Diabetes, the fastest growing health emergency, has created several life-threatening challenges to public health globally. It is a metabolic disorder and triggers many other chronic diseases such as heart attack, diabetic nephropathy, brain strokes, etc. The prime objective of this work is to develop a prognosis tool based on the PIMA Indian Diabetes dataset that will help medical practitioners in reducing the lethality associated with diabetes. Methods Based on the features present in the dataset, two prediction models have been proposed by employing deep learning (DL) and quantum machine learning (QML) techniques. The accuracy has been used to evaluate the prediction capability of these developed models. The outlier rejection, filling missing values, and normalization have been used to uplift the discriminatory performance of these models. Also, the performance of these models has been compared against state-of-the-art models. Results The performance measures such as precision, accuracy, recall, F1 score, specificity, balanced accuracy, false detection rate, missed detection rate, and diagnostic odds ratio have been achieved as 0.90, 0.95, 0.95, 0.93, 0.95, 0.95, 0.03, 0.02, and 399.00 for DL model respectively, However for QML, these measures have been computed as 0.74, 0.86, 0.85, 0.79, 0.86, 0.86, 0.11, 0.05, and 35.89 respectively. Conclusion The proposed DL model has a high diabetes prediction accuracy as compared with the developed QML and existing state-of-the-art models. It also uplifts the performance by 1.06% compared to reported work. However, the performance of the QML model has been found as satisfactory and comparable with existing literature.


2020 ◽  
Vol 36 (20) ◽  
pp. 4977-4983 ◽  
Author(s):  
Jing-Bo Zhou ◽  
Yao Xiong ◽  
Ke An ◽  
Zhi-Qiang Ye ◽  
Yun-Dong Wu

Abstract Motivation Despite of the lack of folded structure, intrinsically disordered regions (IDRs) of proteins play versatile roles in various biological processes, and many nonsynonymous single nucleotide variants (nsSNVs) in IDRs are associated with human diseases. The continuous accumulation of nsSNVs resulted from the wide application of NGS has driven the development of disease-association prediction methods for decades. However, their performance on nsSNVs in IDRs remains inferior, possibly due to the domination of nsSNVs from structured regions in training data. Therefore, it is highly demanding to build a disease-association predictor specifically for nsSNVs in IDRs with better performance. Results We present IDRMutPred, a machine learning-based tool specifically for predicting disease-associated germline nsSNVs in IDRs. Based on 17 selected optimal features that are extracted from sequence alignments, protein annotations, hydrophobicity indices and disorder scores, IDRMutPred was trained using three ensemble learning algorithms on the training dataset containing only IDR nsSNVs. The evaluation on the two testing datasets shows that all the three prediction models outperform 17 other popular general predictors significantly, achieving the ACC between 0.856 and 0.868 and MCC between 0.713 and 0.737. IDRMutPred will prioritize disease-associated IDR germline nsSNVs more reliably than general predictors. Availability and implementation The software is freely available at http://www.wdspdb.com/IDRMutPred. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Victor Zharavin ◽  
James Balmford ◽  
Patrick Metzger ◽  
Melanie Boerries ◽  
Harald Binder ◽  
...  

Pathogenicity is unknown for the majority of human gene variants. For prioritization of sequenced somatic and germline mutation variants, in silico approaches can be utilized. In this study, 84 million non-synonymous Single Nucleotide Variants (SNVs) in the human coding genome were annotated using consensus Variant Effect Prediction (cVEP) method. An algorithm, implemented as a stacked ensemble of supervised learners, performed combination of the 39 functional, conservation mutation impact scores from dbNSFP4.0. Adding gene indispensability score, accounting for differences in the pathogenicities of the variants in the essential and the mutation-tolerant genes, improved the predictions. For each SNV the consensus combination gives either a continuous-value pathogenicity score, or a categorical score in five classes: pathogenic, likely pathogenic, uncertain significance, likely benign, benign. The provided class database is aimed for direct use in clinical practice. The trained prediction models were 5-fold cross-validated on the evidence-based categorical annotations from the ClinVar database. The rankings of the scores based on their ability to predict pathogenicity were obtained. A two-step strategy using the rankings, scores and class annotations is suggested for filtering and prioritization of the human exome mutations in clinical and biological applications of NGS technology.


2020 ◽  
Vol 35 (Supplement_3) ◽  
Author(s):  
GUSTAVO GRELONI ◽  
Federico Varela ◽  
Griselda Bratti ◽  
Guillermo Rosa Diez ◽  
Celia Dos Santos ◽  
...  

Abstract Background and Aims Advances in the past 2 decades have shown atypical hemolytic uremic syndrome (aHUS) to be a disorder of the alternative pathway of complement. Most aHUS cases involve sequence variations in genes encoding complement proteins The term pregnancy associated aHUS (P-aHUS) refer to the thrombotic microangiopathy (TMA) that result from uncontrolled complement activation during pregnancy or the postpartum period. P-aHUS is a devastating systemic disease, with high maternal mortality and morbidity rates, in the pre-eculizumab era. The term ‘C3 glomerulopathy’ (C3GN) encompasses a heterogeneous spectrum of immune-mediated nephropathies that share a common pathological feature, glomerular deposition of C3. This entity may progress to advanced stages of chronic kidney disease and shares a common genetic risk factors with aHUS. Even more, some authors even suggest that C3GN and aHUS represent two forms of a disease spectrum with a common pathogenic principle. Here, we report this rare association and describe the family genetic variants that could cause it Method We describe the clinical and laboratoy data of a patient with the association of aHUS and C3 Glomerulopathy. A genetic study was performed and her available relatives were also screened for mutations/polymorphisms in aHUS-associated complement genes. After extracting gDNA from whole blood (Wizard Genomic DNA Purification Kit, Promega), PCR products of coding sequences and intronic flanking regions of complement genes were sequenced by ABI PRISM 310 Genetic Analyzer (Applied Biosystems). In silico analysis for pathogenicity was completed with Polyphen2-HDIV, PhyloP/Phastcons (MutationTaster), SIFT and PANTHER. All the participants provided informed written consent Results In 3/2012 a 27-year-old patient, with no family history, started his current illness one month after his first natural birth, with acute renal failure and microangiopathic hemolytic anemia, demonstrating a severe TMA in a renal biopsy. Laboratory results showed low C3 serum levels, but C4 were normal, haptoglobine was undetectably low, and all ADAMTS13 parameters were normal. Her urine tests showed also glomerular hematuria and proteinuria in the nephrotic range. She was treated with plasmapheresis and fresh frozen plasma with hematological improvement, but hemodialysis was required for more than 3 months. Despite partial recovery of renal function, six months later reappeared anemia and developed severe arterial hypertension, congestive heart failure and progressive renal insufficiency. Diagnosis of aHUS was made and start treatment with eculizumab with progressive recovery of renal function in the following months. Nevertheless the C3 serum levels persisted low and the proteinuria and hematuria did not change even after long term treatment with eculizumab. Retrospectively, her urynalisis before the pregnant showed proteinuria and hematuria, and a revision of a renal biopsy revealed the presence of dominant C3 deposits in the immunofluorescence, and electrondense deposits in the electronic microscopy, suggesting the diagnostic of C3GN. In a genetic study two novel single nucleotide variants were founded (CFH c.575G>A, p.C192Y (exon 5) (NM_000186), predicted to be pathogenic by 4 of 5 available pathogenicity prediction programs; and CFI c.1189G>T, p.V397L (exon 11) (NM_000204), predicted pathogenic by 0 of 6 available pathogenicity prediction programs). (Figure 1) Conclusion We present here the family genetic bases of a patient who developed a C3GN and a aHUS with a different response to treatment with eculizumab. In this case we identified two novel genetic variants in the CFH and CFI genes in a patient with aHUS, who inherited one variant from each parent. Although the CFI variant is predicted to be benign, the CFH variant is predicted to be damaging. It is located in exon 5, which encodes a portion of the factor H protein implicated in binding to C3b


2015 ◽  
Vol 16 (5) ◽  
pp. 769-779 ◽  
Author(s):  
Sabine C. Mueller ◽  
Christina Backes ◽  
Jan Haas ◽  
Hugo A. Katus ◽  
Benjamin Meder ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document