Abstract LB-009: An automated approach to identifying disease-gene associations from the medical literature to inform gene panel design

Author(s):  
Mark Kiel ◽  
Matthew Schu ◽  
Steve Schwartz ◽  
Victor Weigman
2020 ◽  
Vol 36 (9) ◽  
pp. 2649-2656 ◽  
Author(s):  
Van Dinh Tran ◽  
Alessandro Sperduti ◽  
Rolf Backofen ◽  
Fabrizio Costa

Abstract Motivation The identification of disease–gene associations is a task of fundamental importance in human health research. A typical approach consists in first encoding large gene/protein relational datasets as networks due to the natural and intuitive property of graphs for representing objects’ relationships and then utilizing graph-based techniques to prioritize genes for successive low-throughput validation assays. Since different types of interactions between genes yield distinct gene networks, there is the need to integrate different heterogeneous sources to improve the reliability of prioritization systems. Results We propose an approach based on three phases: first, we merge all sources in a single network, then we partition the integrated network according to edge density introducing a notion of edge type to distinguish the parts and finally, we employ a novel node kernel suitable for graphs with typed edges. We show how the node kernel can generate a large number of discriminative features that can be efficiently processed by linear regularized machine learning classifiers. We report state-of-the-art results on 12 disease–gene associations and on a time-stamped benchmark containing 42 newly discovered associations. Availability and implementation Source code: https://github.com/dinhinfotech/DiGI.git. Supplementary information Supplementary data are available at Bioinformatics online.


2011 ◽  
Vol 29 (1) ◽  
pp. 55-72 ◽  
Author(s):  
Kenneth Wysocki ◽  
Leslie Ritter

Using bioinformatics computational tools, network maps that integrate the complex interactions of genetics and diseases have been developed. The purpose of this review is to introduce the reader to new approaches in understanding disease–gene associations using network maps, with an emphasis on how the human disease network (HDN) map (or diseasome) was constructed. A search was conducted in PubMed using the years 1999–2011 and using key words diseasome, molecular interaction, interactome, protein–protein interaction, and gene. The information reviewed included journal reviews, open source and webbased databases, and open source computational tools.


2021 ◽  
Vol 17 (6) ◽  
pp. e1009048
Author(s):  
Zhong Li ◽  
Kaiyancheng Jiang ◽  
Shengwei Qin ◽  
Yijun Zhong ◽  
Arne Elofsson

Recently, an increasing number of studies have demonstrated that miRNAs are involved in human diseases, indicating that miRNAs might be a potential pathogenic factor for various diseases. Therefore, figuring out the relationship between miRNAs and diseases plays a critical role in not only the development of new drugs, but also the formulation of individualized diagnosis and treatment. As the prediction of miRNA-disease association via biological experiments is expensive and time-consuming, computational methods have a positive effect on revealing the association. In this study, a novel prediction model integrating GCN, CNN and Squeeze-and-Excitation Networks (GCSENet) was constructed for the identification of miRNA-disease association. The model first captured features by GCN based on a heterogeneous graph including diseases, genes and miRNAs. Then, considering the different effects of genes on each type of miRNA and disease, as well as the different effects of the miRNA-gene and disease-gene relationships on miRNA-disease association, a feature weight was set and a combination of miRNA-gene and disease-gene associations was added as feature input for the convolution operation in CNN. Furthermore, the squeeze and excitation blocks of SENet were applied to determine the importance of each feature channel and enhance useful features by means of the attention mechanism, thus achieving a satisfactory prediction of miRNA-disease association. The proposed method was compared against other state-of-the-art methods. It achieved an AUROC score of 95.02% and an AUPR score of 95.55% in a 10-fold cross-validation, which led to the finding that the proposed method is superior to these popular methods on most of the performance evaluation indexes.


2014 ◽  
Author(s):  
Sune Pletscher-Frankild ◽  
Albert Pallejà ◽  
Kalliopi Tsafou ◽  
Janos X Binder ◽  
Lars Juhl Jensen

Text mining is a flexible technology that can be applied to numerous different tasks in biology and medicine. We present a system for extracting disease–gene associations from biomedical abstracts. The system consists of a highly efficient dictionary-based tagger for named entity recognition of human genes and diseases, which we combine with a scoring scheme that takes into account co-occurrences both within and between sentences. We show that this approach is able to extract half of all manually curated associations with a false positive rate of only 0.16%. Nonetheless, text mining should not stand alone, but be combined with other types of evidence. For this reason, we have developed the DISEASES resource, which integrates the results from text mining with manually curated disease–gene associations, cancer mutation data, and genome-wide association studies from existing databases. The DISEASES resource is accessible through a user-friendly web interface at http://diseases.jensenlab.org/, where the text-mining software and all associations are also freely available for download.


2016 ◽  
Vol 58 (3) ◽  
Author(s):  
Peter M. Krawitz

AbstractWith every additional individual whose genome is sequenced thousands of novel variants enter the scene. It is these variants of unknown clinical significance, VUCS, that represent a great challenge to geneticists, who are dealing with high-throughput sequencing data sets. Especially in diagnostics of patients with unknown monogenic disease the joint effort of geneticists is required to find new disease gene associations. For this purpose, online platforms for matchmaking have been developed that allow clinician scientists to collaborate worldwide and to share medically relevant data. However, for a success of these tools, skills in deep phenotyping as well as new statistical approaches will be required.


BMC Genomics ◽  
2017 ◽  
Vol 18 (S5) ◽  
Author(s):  
Giulia Babbi ◽  
Pier Luigi Martelli ◽  
Giuseppe Profiti ◽  
Samuele Bovo ◽  
Castrense Savojardo ◽  
...  

2019 ◽  
Author(s):  
Jiho Park ◽  
Agustin Lopez Marquez ◽  
Arjun Puranik ◽  
Ajit Rajasekharan ◽  
Murali Aravamudan ◽  
...  

AbstractThe recent explosion of biomedical knowledge presents both a major opportunity and challenge for scientists tackling complex problems in healthcare. Here we present an approach for synthesizing biomedical knowledge based on a combination of word-embeddings and select cooccurrences. We evaluated our ability to recapitulate and retrospectively predict disease-gene associations from the Online Mendelian Inheritance in Man (OMIM) resource. Our metrics achieved an area under the curve (AUC) value of 0.981 at the recapitulation task for 2,400 disease-gene associations. At the most stringent cutoff, our metrics predicted 13.89% of these associations before their first cooccurrence in the literature, with a median time of 4 years between prediction and first cooccurrence. Finally, our literature metrics can be combined with human genetics data to retrospectively predict disease-gene associations, IL-6 and Giant Cell Arteritis provided as an example. We believe this framework can provide robust biomedical hypotheses at a much faster pace than current standard practices.


Sign in / Sign up

Export Citation Format

Share Document