scholarly journals CADA: phenotype-driven gene prioritization based on a case-enriched knowledge graph

2021 ◽  
Vol 3 (3) ◽  
Author(s):  
Chengyao Peng ◽  
Simon Dieck ◽  
Alexander Schmid ◽  
Ashar Ahmad ◽  
Alexej Knaus ◽  
...  

Abstract Many rare syndromes can be well described and delineated from other disorders by a combination of characteristic symptoms. These phenotypic features are best documented with terms of the Human Phenotype Ontology (HPO), which are increasingly used in electronic health records (EHRs), too. Many algorithms that perform HPO-based gene prioritization have also been developed; however, the performance of many such tools suffers from an over-representation of atypical cases in the medical literature. This is certainly the case if the algorithm cannot handle features that occur with reduced frequency in a disorder. With Cada, we built a knowledge graph based on both case annotations and disorder annotations. Using network representation learning, we achieve gene prioritization by link prediction. Our results suggest that Cada exhibits superior performance particularly for patients that present with the pathognomonic findings of a disease. Additionally, information about the frequency of occurrence of a feature can readily be incorporated, when available. Crucial in the design of our approach is the use of the growing amount of phenotype–genotype information that diagnostic labs deposit in databases such as ClinVar. By this means, Cada is an ideal reference tool for differential diagnostics in rare disorders that can also be updated regularly.

2021 ◽  
Author(s):  
Chengyao Peng ◽  
Simon Dieck ◽  
Alexander Schmid ◽  
Ashar Ahmad ◽  
Alexej Knaus ◽  
...  

AbstractMany rare syndromes can be well described and delineated from other disorders by a combination of characteristic symptoms. These phenotypic features are best documented with terms of the human phenotype ontology (HPO), which is increasingly used in electronic health records (EHRs), too. Many algorithms that perform HPO-based gene prioritization have also been developed, however, the performance of many such tools suffers from an overrepresentation of atypical cases in the medical literature. This is certainly the case if the algorithm cannot handle features that occur with reduced frequency in a disorder. With CADA we built a knowledge-graph that is based on case annotations and disorder annotations and show that CADA exhibits superior performance particularly for patients that present with the pathognomonic findings of a disease. Crucial in the design of our approach is the use of the growing amount of phenotypic information that diagnostic labs deposit in databases such as ClinVar. By this means CADA is an ideal reference tool for differential diagnostics in rare disorders that can also be updated regularly.


Author(s):  
Bo Wang ◽  
Tao Shen ◽  
Guodong Long ◽  
Tianyi Zhou ◽  
Ying Wang ◽  
...  

2019 ◽  
Vol 20 (S16) ◽  
Author(s):  
Da Zhang ◽  
Mansur Kabuka

Abstract Background Protein-protein interactions(PPIs) engage in dynamic pathological and biological procedures constantly in our life. Thus, it is crucial to comprehend the PPIs thoroughly such that we are able to illuminate the disease occurrence, achieve the optimal drug-target therapeutic effect and describe the protein complex structures. However, compared to the protein sequences obtainable from various species and organisms, the number of revealed protein-protein interactions is relatively limited. To address this dilemma, lots of research endeavor have investigated in it to facilitate the discovery of novel PPIs. Among these methods, PPI prediction techniques that merely rely on protein sequence data are more widespread than other methods which require extensive biological domain knowledge. Results In this paper, we propose a multi-modal deep representation learning structure by incorporating protein physicochemical features with the graph topological features from the PPI networks. Specifically, our method not only bears in mind the protein sequence information but also discerns the topological representations for each protein node in the PPI networks. In our paper, we construct a stacked auto-encoder architecture together with a continuous bag-of-words (CBOW) model based on generated metapaths to study the PPI predictions. Following by that, we utilize the supervised deep neural networks to identify the PPIs and classify the protein families. The PPI prediction accuracy for eight species ranged from 96.76% to 99.77%, which signifies that our multi-modal deep representation learning framework achieves superior performance compared to other computational methods. Conclusion To the best of our knowledge, this is the first multi-modal deep representation learning framework for examining the PPI networks.


2020 ◽  
Vol 34 (01) ◽  
pp. 27-34 ◽  
Author(s):  
Lei Chen ◽  
Le Wu ◽  
Richang Hong ◽  
Kun Zhang ◽  
Meng Wang

Graph Convolutional Networks~(GCNs) are state-of-the-art graph based representation learning models by iteratively stacking multiple layers of convolution aggregation operations and non-linear activation operations. Recently, in Collaborative Filtering~(CF) based Recommender Systems~(RS), by treating the user-item interaction behavior as a bipartite graph, some researchers model higher-layer collaborative signals with GCNs. These GCN based recommender models show superior performance compared to traditional works. However, these models suffer from training difficulty with non-linear activations for large user-item graphs. Besides, most GCN based models could not model deeper layers due to the over smoothing effect with the graph convolution operation. In this paper, we revisit GCN based CF models from two aspects. First, we empirically show that removing non-linearities would enhance recommendation performance, which is consistent with the theories in simple graph convolutional networks. Second, we propose a residual network structure that is specifically designed for CF with user-item interaction modeling, which alleviates the over smoothing problem in graph convolution aggregation operation with sparse user-item interaction data. The proposed model is a linear model and it is easy to train, scale to large datasets, and yield better efficiency and effectiveness on two real datasets. We publish the source code at https://github.com/newlei/LR-GCCF.


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 32816-32825
Author(s):  
Seungmin Seo ◽  
Byungkook Oh ◽  
Kyong-Ho Lee

2020 ◽  
Vol 34 (01) ◽  
pp. 825-832 ◽  
Author(s):  
Liantao Ma ◽  
Junyi Gao ◽  
Yasha Wang ◽  
Chaohe Zhang ◽  
Jiangtao Wang ◽  
...  

Deep learning-based health status representation learning and clinical prediction have raised much research interest in recent years. Existing models have shown superior performance, but there are still several major issues that have not been fully taken into consideration. First, the historical variation pattern of the biomarker in diverse time scales plays a vital role in indicating the health status, but it has not been explicitly extracted by existing works. Second, key factors that strongly indicate the health risk are different among patients. It is still challenging to adaptively make use of the features for patients in diverse conditions. Third, using prediction models as the black box will limit the reliability in clinical practice. However, none of the existing works can provide satisfying interpretability and meanwhile achieve high prediction performance. In this work, we develop a general health status representation learning model, named AdaCare. It can capture the long and short-term variations of biomarkers as clinical features to depict the health status in multiple time scales. It also models the correlation between clinical features to enhance the ones which strongly indicate the health status and thus can maintain a state-of-the-art performance in terms of prediction accuracy while providing qualitative interpretability. We conduct a health risk prediction experiment on two real-world datasets. Experiment results indicate that AdaCare outperforms state-of-the-art approaches and provides effective interpretability, which is verifiable by clinical experts.


Sign in / Sign up

Export Citation Format

Share Document