scholarly journals Predicting deleterious missense genetic variants via integrative supervised nonnegative matrix tri-factorization

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Asieh Amousoltani Arani ◽  
Mohammadreza Sehhati ◽  
Mohammad Amin Tabatabaiefar

AbstractAmong an assortment of genetic variations, Missense are major ones which a small subset of them may led to the upset of the protein function and ultimately end in human diseases. Various machine learning methods were declared to differentiate deleterious and benign missense variants by means of a large number of features, including structure, sequence, interaction networks, gene disease associations as well as phenotypes. However, development of a reliable and accurate algorithm for merging heterogeneous information is highly needed as it could be captured all information of complex interactions on network that genes participate in. In this study we proposed a new method based on the non-negative matrix tri-factorization clustering method. We outlined two versions of the proposed method: two-source and three-source algorithms. Two-source algorithm aggregates individual deleteriousness prediction methods and PPI network, and three-source algorithm incorporates gene disease associations into the other sources already mentioned. Four benchmark datasets were employed for internally and externally validation of both algorithms of our predictor. The results at all datasets confirmed that, our method outperforms most state of the art variant prediction tools. Two key features of our variant effect prediction method are worth mentioning. Firstly, despite the fact that the incorporation of gene disease information at three-source algorithm can improve prediction performance by comparison with two-source algorithm, our method did not hinder by type 2 circularity error unlike some recent ensemble-based prediction methods. Type 2 circularity error occurs when the predictor annotates variants on the basis of the genes located on. Secondly, the performance of our predictor is superior over other ensemble-based methods for variants positioned on genes in which we do not have enough information about their pathogenicity.

2021 ◽  
Vol 16 ◽  
Author(s):  
Yayan Zhang ◽  
Guihua Duan ◽  
Cheng Yan ◽  
Haolun Yi ◽  
Fang-Xiang Wu ◽  
...  

Background: Increasing evidence has indicated that miRNA-disease association prediction plays a critical role in the study of clinical drugs. Researchers have proposed many computational models for miRNA-disease prediction. However, there is no unified platform to compare and analyze the pros and cons or share the code and data of these models. Objective: In this study, we develop an easy-to-use platform (MDAPlatform) to construct and assess miRNA-disease association prediction method. Methods: MDAPlatform integrates the relevant data of miRNA, disease and miRNA-disease associations that are used in previous miRNA-disease association prediction studies. Based on the componentized model, it develops differet components of previous computational methods. Results: Users can conduct cross validation experiments and compare their methods with other methods, and the visualized comparison results are also provided. Conclusion: Based on the componentized model, MDAPlatform provides easy-to-operate interfaces to construct the miRNA-disease association method, which is beneficial to develop new miRNA-disease association prediction methods in the future.


Author(s):  
Yongxian Fan ◽  
Meijun Chen ◽  
Xiaoyong Pan

Abstract Long noncoding RNAs (lncRNAs) play important roles in various biological regulatory processes, and are closely related to the occurrence and development of diseases. Identifying lncRNA-disease associations is valuable for revealing the molecular mechanism of diseases and exploring treatment strategies. Thus, it is necessary to computationally predict lncRNA-disease associations as a complementary method for biological experiments. In this study, we proposed a novel prediction method GCRFLDA based on the graph convolutional matrix completion. GCRFLDA first constructed a graph using the available lncRNA-disease association information. Then, it constructed an encoder consisting of conditional random field and attention mechanism to learn efficient embeddings of nodes, and a decoder layer to score lncRNA-disease associations. In GCRFLDA, the Gaussian interaction profile kernels similarity and cosine similarity were fused as side information of lncRNA and disease nodes. Experimental results on four benchmark datasets show that GCRFLDA is superior to other existing methods. Moreover, we conducted case studies on four diseases and observed that 70 of 80 predicted associated lncRNAs were confirmed by the literature.


2011 ◽  
Vol 96 (2) ◽  
pp. E394-E403 ◽  
Author(s):  
Neeraj K. Sharma ◽  
Kurt A. Langberg ◽  
Ashis K. Mondal ◽  
Steven C. Elbein ◽  
Swapan K. Das

abstract Context: Genome-wide association scans (GWAS) have identified novel single nucleotide polymorphisms (SNPs) that increase T2D susceptibility and indicated the role of nearby genes in T2D pathogenesis. Objective: We hypothesized that T2D-associated SNPs act as cis-regulators of nearby genes in human tissues and that expression of these transcripts may correlate with metabolic traits, including insulin sensitivity (SI). Design, Settings, and Patients: Association of SNPs with the expression of their nearest transcripts was tested in adipose and muscle from 168 healthy individuals who spanned a broad range of SI and body mass index (BMI) and in transformed lymphocytes (TLs). We tested correlations between the expression of these transcripts in adipose and muscle with metabolic traits. Utilizing allelic expression imbalance (AEI) analysis we examined the presence of other cis-regulators for those transcripts in TLs. Results: SNP rs9472138 was significantly (P = 0.037) associated with the expression of VEGFA in TLs while rs6698181 was detected as a cis-regulator for the PKN2 in muscle (P = 0.00027) and adipose (P = 0.018). Significant association was also observed for rs17036101 (P = 0.001) with expression of SYN2 in adipose of Caucasians. Among 19 GWAS-implicated transcripts, expression of VEGFA in adipose was correlated with BMI (r = −0.305) and SI (r = 0.230). Although only a minority of the T2D-associated SNPs were validated as cis-eQTLs for nearby transcripts, AEI analysis indicated presence of other cis-regulatory polymorphisms in 54% of these transcripts. Conclusions: Our study suggests that a small subset of GWAS-identified SNPs may increase T2D susceptibility by modulating expression of nearby transcripts in adipose or muscle.


2022 ◽  
Author(s):  
Maxat Kulmanov ◽  
Robert Hoehndorf

Motivation: Protein functions are often described using the Gene Ontology (GO) which is an ontology consisting of over 50,000 classes and a large set of formal axioms. Predicting the functions of proteins is one of the key challenges in computational biology and a variety of machine learning methods have been developed for this purpose. However, these methods usually require significant amount of training data and cannot make predictions for GO classes which have only few or no experimental annotations. Results: We developed DeepGOZero, a machine learning model which improves predictions for functions with no or only a small number of annotations. To achieve this goal, we rely on a model-theoretic approach for learning ontology embeddings and combine it with neural networks for protein function prediction. DeepGOZero can exploit formal axioms in the GO to make zero-shot predictions, i.e., predict protein functions even if not a single protein in the training phase was associated with that function. Furthermore, the zero-shot prediction method employed by DeepGOZero is generic and can be applied whenever associations with ontology classes need to be predicted. Availability: http://github.com/bio-ontology-research-group/deepgozero


2020 ◽  
Vol 8 (1) ◽  
pp. 42
Author(s):  
Firyal Baktir ◽  
Dwi Prijatmoko ◽  
Masniari Novita

There are several methods of analizing tooth size discrepancy in orthodontics include prediction methods for mixed dentition. Prediction method of Moyers and Sitepu most commonly used although both were obtained from 2 different races, Caucasian and Deutromelayu. Yemeni ethnic is one of the ethnic groups settled in Indonesia which descendants of the Caucasian race. The aim of the study was to observed the suitable prediction table for Yemeni ethnic. It was an observasional analitics study consist of 40 samples with cross sectional design. The results showed that slight difference for prediction of Moyers on the maxilla (1.02) and prediction of Sitepu on the mandibula (0.11). As conclusion, the most suitable predicition method for Yemeni ethnic is Moyers’s method for maxila and sitepu’s method for mandibula.   Key words: mesiodistal width permanen teeth, Moyers method, Sitepu method, Yemeni Etnic


2018 ◽  
Author(s):  
Md Habibur Rahman ◽  
Silong Peng ◽  
Chen Chen ◽  
Pietro Lio’ ◽  
Mohammad Ali Moni

Neurological diseases (NDs) are progressive disorder often advances with age and comorbidities of Type 2 diabetes (T2D). Epidemiological, clinical and neuropathological evidence advocate that patients with T2D are at an increased risk of getting NDs. However, it is very little known how T2D affects the risk and severity of NDs. To tackle these problems, we employed a transcriptional analysis of affected tissues using agnostic approaches to identify overlapping cellular functions. In this study, we examined gene expression microarray human datasets along with control and disease-affected individuals. Differentially expressed genes (DEG) were identified for both T2D and NDs that includes Alzheimer Disease (AD), Parkinson Disease (PD), Amyotrophic Lateral Sclerosis (ALS), Epilepsy Disease (ED), Huntington Disease (HD), Cerebral Palsy (CP) and Multiple Sclerosis Disease (MSD). We have developed genetic association and diseasome network of T2D and NDs based on the neighborhood-based benchmarking and multilayer network topology approaches. Overlapping DEG sets go through protein-protein interaction and gene enrichment using pathway analysis and gene ontology methods, identifying numerous candidate common genes and pathways. Gene expression analysis platforms have been extensively used to investigate altered pathways and to identify potential biomarkers and drug targets. Finally, we validated our identified biomarkers using the gold benchmark datasets which identified corresponding relations of T2D and NDs. Therapeutic targets aimed at attenuating identified altered pathway could ameliorate neurological dysfunction in a T2D patient.


1993 ◽  
Vol 30 (04) ◽  
pp. 297-307
Author(s):  
John M. Almeter

There are literally dozens of different ways that the boat designer can predict the resistance of planing hulls. The term planing hull is used generically to describe the majority of hard chine boats being built today. No single prediction method is good for all types of planing hulls. Some methods can be relied on to give good predictions for certain boats and other methods can't be relied upon at all. This paper is meant as a reference for designers in selecting resistance prediction methods for planing hulls. It describes numerous resistance prediction methods and gives their variable ranges and the type of planing hulls they are based on or are intended for. Inherent problems or limitations of the methods are stated. The concept of hull shape, which is often neglected in resistance prediction, and its important role are discussed.


2020 ◽  
Vol 10 (6) ◽  
pp. 1942
Author(s):  
You Xianhui ◽  
Wu Zhaoqi ◽  
Chen Zehao

Grouted connections are commonly used in marine engineering, especially on oil platforms, cross-sea bridges, and offshore wind power turbines. The prediction methods for axial carrying capacity of grouted connections with shear keys and their application ranges in current codes were analyzed in this paper. The calculated results by using different codes were compared based on a practical grouted connection between steel piles and the jacket foundation of a wind turbine. The research team conducted axial compression tests on seven specimens, collected a wide range of experimental results to establish a database, and finally compared the standard calculation results with the experimental results. The study indicates that the axial strength of grouted connections predicted by different methods is distinct. The calculation formula of the British Health and Safety Executive (HSE, 2002) has obvious limitations; specifically, with increased shear keys, strength is overestimated, resulting in insecure design outcome of structures. The results calculated by the Norwegian Det Norske Veritas (DNV, 2013) are generally consistent with the experimental results, in which the reduction effect of multiple shear keys was considered. The prediction method of the American Petroleum Institute (API, 2007), which undervalues the bearing performance of connections, is excessively conservative. The method of the combined Norwegian and German Det Norske Veritas–Germanischer Lloyd (DNV-GL, 2016) has wider applicability and is safe, reliable, and economical.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Yubin Xiao ◽  
Zheng Xiao ◽  
Xiang Feng ◽  
Zhiping Chen ◽  
Linai Kuang ◽  
...  

Abstract Background Accumulating evidence has demonstrated that long non-coding RNAs (lncRNAs) are closely associated with human diseases, and it is useful for the diagnosis and treatment of diseases to get the relationships between lncRNAs and diseases. Due to the high costs and time complexity of traditional bio-experiments, in recent years, more and more computational methods have been proposed by researchers to infer potential lncRNA-disease associations. However, there exist all kinds of limitations in these state-of-the-art prediction methods as well. Results In this manuscript, a novel computational model named FVTLDA is proposed to infer potential lncRNA-disease associations. In FVTLDA, its major novelty lies in the integration of direct and indirect features related to lncRNA-disease associations such as the feature vectors of lncRNA-disease pairs and their corresponding association probability fractions, which guarantees that FVTLDA can be utilized to predict diseases without known related-lncRNAs and lncRNAs without known related-diseases. Moreover, FVTLDA neither relies solely on known lncRNA-disease nor requires any negative samples, which guarantee that it can infer potential lncRNA-disease associations more equitably and effectively than traditional state-of-the-art prediction methods. Additionally, to avoid the limitations of single model prediction techniques, we combine FVTLDA with the Multiple Linear Regression (MLR) and the Artificial Neural Network (ANN) for data analysis respectively. Simulation experiment results show that FVTLDA with MLR can achieve reliable AUCs of 0.8909, 0.8936 and 0.8970 in 5-Fold Cross Validation (fivefold CV), 10-Fold Cross Validation (tenfold CV) and Leave-One-Out Cross Validation (LOOCV), separately, while FVTLDA with ANN can achieve reliable AUCs of 0.8766, 0.8830 and 0.8807 in fivefold CV, tenfold CV, and LOOCV respectively. Furthermore, in case studies of gastric cancer, leukemia and lung cancer, experiment results show that there are 8, 8 and 8 out of top 10 candidate lncRNAs predicted by FVTLDA with MLR, and 8, 7 and 8 out of top 10 candidate lncRNAs predicted by FVTLDA with ANN, having been verified by recent literature. Comparing with the representative prediction model of KATZLDA, comparison results illustrate that FVTLDA with MLR and FVTLDA with ANN can achieve the average case study contrast scores of 0.8429 and 0.8515 respectively, which are both notably higher than the average case study contrast score of 0.6375 achieved by KATZLDA. Conclusion The simulation results show that FVTLDA has good prediction performance, which is a good supplement to future bioinformatics research.


Sign in / Sign up

Export Citation Format

Share Document