scholarly journals Benchmarking network propagation methods for disease gene identification

2018 ◽  
Author(s):  
Sergio Picart-Armada ◽  
Steven J. Barrett ◽  
David R. Willé ◽  
Alexandre Perera-Lluna ◽  
Alex Gutteridge ◽  
...  

AbstractBackgroundIn-silico identification of potential disease genes has become an essential aspect of drug target discovery. Recent studies suggest that one powerful way to identify successful targets is through the use of genetic and genomic information. Given a known disease gene, leveraging intermolecular connections via networks and pathways seems a natural way to identify other genes and proteins that are involved in similar biological processes, and that can therefore be analysed as additional targets.ResultsHere, we systematically tested the ability of 12 varied network-based algorithms to identify target genes and cross-validated these using gene-disease data from Open Targets on 22 common diseases. We considered two biological networks, six performance metrics and compared two types of input gene-disease association scores. We also compared several cross-validation schemes and showed that different choices had a remarkable impact on the performance estimates. When seeding biological networks with known drug targets, we found that machine learning and diffusion-based methods are able to find novel targets, showing around 2-4 true hits in the top 20 suggestions. Seeding the networks with genes associated to disease by genetics resulted in poorer performance, below 1 true hit on average. We also observed that the use of a larger network, although noisier, improved overall performance.ConclusionsWe conclude that machine learning and diffusion-based prioritisers are suited for drug discovery in practice and improve over simpler neighbour-voting methods. We also demonstrate the large effect of several factors on prediction performance, especially the validation strategy, input biological network, and definition of seed disease genes.

2019 ◽  
Author(s):  
Zied Gaieb ◽  
conor parks ◽  
Rommie Amaro

<div> <div> <div> <p>Non linearities of biological networks present ample opportunity for synergistic protein targeting combinations. Yet, to date, our ability to design multi-target inhibitors and predict polypharmacology binding profiles remains limited. Herein, we present a systematic benchmarking of protein pocket comparison algorithms from the literature, as well as novel machine learning models developed to predict whether two proteins will bind the same ligand. The results demonstrate that previously reported performance metrics from the literature could be inflated due to a bias towards proteins of similar folds when identifying protein capable of binding the same ligand. This observation motivated a more in-depth evaluation of the methods against two subsets of same and cross protein fold comparisons. In a head to head comparison using the cross protein fold subset, we found that the proteometric machine learning models were the best performing models overall. </p> </div> </div> </div>


2020 ◽  
Vol 21 (10) ◽  
pp. 790-803 ◽  
Author(s):  
Dongrui Gao ◽  
Qingyuan Chen ◽  
Yuanqi Zeng ◽  
Meng Jiang ◽  
Yongqing Zhang

Drug target discovery is a critical step in drug development. It is the basis of modern drug development because it determines the target molecules related to specific diseases in advance. Predicting drug targets by computational methods saves a great deal of financial and material resources compared to in vitro experiments. Therefore, several computational methods for drug target discovery have been designed. Recently, machine learning (ML) methods in biomedicine have developed rapidly. In this paper, we present an overview of drug target discovery methods based on machine learning. Considering that some machine learning methods integrate network analysis to predict drug targets, network-based methods are also introduced in this article. Finally, the challenges and future outlook of drug target discovery are discussed.


2018 ◽  
Vol 18 (13) ◽  
pp. 1053-1061 ◽  
Author(s):  
Bhushan Jain ◽  
Utkarsh Raj ◽  
Pritish Kumar Varadwaj

Screening and identifying a disease-specific novel drug target is the first step towards a rational drug designing approach. Due to the advent of high throughput data generation techniques, the protein search space has now exceeded 24,500 human protein coding genes, which encodes approximately 1804proteins. This work aims at mining out the relationship between target proteins, drugs, and diseases genes through a network-based systems biology approach. A network of all FDA approved drugs, along with their targets were utilized to construct the proposed Drug Target (DT) network. Further, the experimental drugs were mapped into the DT network to infer the functional relationship by utilizing the respective network attributes. Similar to the DT network, a network of disease genes was created through OMIM Gene Map and Morbid Map, to link the binary associations of disorder-disease genes. In the proposed model of Human Interactome Network, shortest path length between the target protein and disease gene was used to infer the correlation between ‘Drug Targets’ and ‘Disease-Gene’. This network-based study will help researchers to analyze, infer and identify disease-specific novel drug targets through harnessing the graph theory based network attributes.


2019 ◽  
Author(s):  
Zied Gaieb ◽  
conor parks ◽  
Rommie Amaro

<div> <div> <div> <p>Non linearities of biological networks present ample opportunity for synergistic protein targeting combinations. Yet, to date, our ability to design multi-target inhibitors and predict polypharmacology binding profiles remains limited. Herein, we present a systematic benchmarking of protein pocket comparison algorithms from the literature, as well as novel machine learning models developed to predict whether two proteins will bind the same ligand. The results demonstrate that previously reported performance metrics from the literature could be inflated due to a bias towards proteins of similar folds when identifying protein capable of binding the same ligand. This observation motivated a more in-depth evaluation of the methods against two subsets of same and cross protein fold comparisons. In a head to head comparison using the cross protein fold subset, we found that the proteometric machine learning models were the best performing models overall. </p> </div> </div> </div>


2020 ◽  
Author(s):  
Tiantian Liu ◽  
Pengli Xu ◽  
Shuishui Qi ◽  
Shaorui Ke ◽  
Qin Hu ◽  
...  

Abstract Background Idiopathic pulmonary fibrosis (IPF) is a chronic respiratory disease with high incidence rate, morbidity and mortality. Jinshui Huanxian formula (JHF) is an empirical formula for the pathogenesis of lung-kidney qi deficiency and phlegm-blood stasis in pulmonary fibrosis. The purpose of this study is to explore the pharmacological mechanism of JHF action in IPF therapy by network interaction analysis. Methods The main active components and corresponding target genes of JHF were predicted using various databases. Two sets of IPF disease genes were obtained from the DisGeNET database and GEO database. Two sets of drug targets for IPF treatment were collected and the overlapping genes between disease genes and drug targets were analyzed. The target genes of JHF were intersected with the differentially expressed genes of IPF to obtain the predicted targets of JHF acting on IPF. The functions and pathways of predicted targets acting on IPF were analyzed by using DAVID and KEGG pathway database. Finally, the resulting drug target mechanisms were validated in a rat model of pulmonary fibrosis. Results 494 active compounds and 1304 corresponding targets were screened. Intersection analysis showed that 4 genes were common genes of JHF targets, IPF disease genes and anti-IPF drugs in KEGG database, and these genes were targeted by several compounds of JHF respectively. 72 JHF targets were closely related with IPF, and were thus considered therapeutically relevant. The targets were screened and participated in the regulation of IPF through 18 pathways. The molecular functions of targets included regulation of oxidoreductase activity, kinase regulator activity, phosphotransferase activity and transmembrane receptor protein kinase activity. In vivo experiments showed that JHF could alleviate the degree of pulmonary fibrosis, including the decrease of collagen deposition and epithelial-mesenchymal transition. Conclusions This study explored the mechanisms of JHF from a systematic point of view, trying to identify the specific target pathways acing on IPF. Pharmacological network with in vivo validation explained the potential roles and mechanisms of JHF in IPF therapy.


2021 ◽  
pp. bi202101
Author(s):  
Peter Habib ◽  
Alsamman Alsamman ◽  
Sameh Hassanein ◽  
Aladdin Hamwieh

The future of therapeutics depends on understanding the interaction between the chemical structure of the drug and the target protein that contributes to the etiology of the disease in order to improve drug discovery. Predicting the target of unknown drugs being investigated from already identified drug data is very important not only for understanding different processes of drug and molecular interactions but also for the development of new drugs. Using machine learning and published drug information we design an easy-to-use tool that predicts biological target proteins for medical drugs. TarDict is based on a chemical-simplified line-entry molecular input system called SMILES. It receives SMILES entries and returns a list of possible similar drugs as well as possible drug-targets. TarDict uses 20442 drug entries that have well-known biological targets to construct a prognostic computational model capable of predicting novel drug targets with an accuracy of 95%. We developed a machine learning approach to recommend target proteins to approved drug targets. We have shown that the proposed method is highly predictive on a testing dataset consisting of 4088 targets and 102 manually entered drugs. The proposed computational model is an efficient and cost-effective tool for drug target discovery and prioritization. Such novel tool could be used to enhance drug design, predict potential target and identify combination therapy crossroads.


2020 ◽  
Vol 19 (5-6) ◽  
pp. 350-363
Author(s):  
Duc-Hau Le

Abstract Disease gene prediction is an essential issue in biomedical research. In the early days, annotation-based approaches were proposed for this problem. With the development of high-throughput technologies, interaction data between genes/proteins have grown quickly and covered almost genome and proteome; thus, network-based methods for the problem become prominent. In parallel, machine learning techniques, which formulate the problem as a classification, have also been proposed. Here, we firstly show a roadmap of the machine learning-based methods for the disease gene prediction. In the beginning, the problem was usually approached using a binary classification, where positive and negative training sample sets are comprised of disease genes and non-disease genes, respectively. The disease genes are ones known to be associated with diseases; meanwhile, non-disease genes were randomly selected from those not yet known to be associated with diseases. However, the later may contain unknown disease genes. To overcome this uncertainty of defining the non-disease genes, more realistic approaches have been proposed for the problem, such as unary and semi-supervised classification. Recently, more advanced methods, including ensemble learning, matrix factorization and deep learning, have been proposed for the problem. Secondly, 12 representative machine learning-based methods for the disease gene prediction were examined and compared in terms of prediction performance and running time. Finally, their advantages, disadvantages, interpretability and trust were also analyzed and discussed.


2020 ◽  
Vol 27 ◽  
Author(s):  
Gabriela Bitencourt-Ferreira ◽  
Camila Rizzotto ◽  
Walter Filgueira de Azevedo Junior

Background: Analysis of atomic coordinates of protein-ligand complexes can provide three-dimensional data to generate computational models to evaluate binding affinity and thermodynamic state functions. Application of machine learning techniques can create models to assess protein-ligand potential energy and binding affinity. These methods show superior predictive performance when compared with classical scoring functions available in docking programs. Objective: Our purpose here is to review the development and application of the program SAnDReS. We describe the creation of machine learning models to assess the binding affinity of protein-ligand complexes. Method: SAnDReS implements machine learning methods available in the scikit-learn library. This program is available for download at https://github.com/azevedolab/sandres. SAnDReS uses crystallographic structures, binding, and thermodynamic data to create targeted scoring functions. Results: Recent applications of the program SAnDReS to drug targets such as Coagulation factor Xa, cyclin-dependent kinases, and HIV-1 protease were able to create targeted scoring functions to predict inhibition of these proteins. These targeted models outperform classical scoring functions. Conclusion: Here, we reviewed the development of machine learning scoring functions to predict binding affinity through the application of the program SAnDReS. Our studies show the superior predictive performance of the SAnDReS-developed models when compared with classical scoring functions available in the programs such as AutoDock4, Molegro Virtual Docker, and AutoDock Vina.


2019 ◽  
Vol 14 (3) ◽  
pp. 211-225 ◽  
Author(s):  
Ming Fang ◽  
Xiujuan Lei ◽  
Ling Guo

Background: Essential proteins play important roles in the survival or reproduction of an organism and support the stability of the system. Essential proteins are the minimum set of proteins absolutely required to maintain a living cell. The identification of essential proteins is a very important topic not only for a better comprehension of the minimal requirements for cellular life, but also for a more efficient discovery of the human disease genes and drug targets. Traditionally, as the experimental identification of essential proteins is complex, it usually requires great time and expense. With the cumulation of high-throughput experimental data, many computational methods that make useful complements to experimental methods have been proposed to identify essential proteins. In addition, the ability to rapidly and precisely identify essential proteins is of great significance for discovering disease genes and drug design, and has great potential for applications in basic and synthetic biology research. Objective: The aim of this paper is to provide a review on the identification of essential proteins and genes focusing on the current developments of different types of computational methods, point out some progress and limitations of existing methods, and the challenges and directions for further research are discussed.


Sign in / Sign up

Export Citation Format

Share Document