Prediction of comorbid diseases using weighted geometric embedding of human interactome

Abstract Background Comorbidity is the phenomenon of two or more diseases occurring simultaneously not by random chance and presents great challenges to accurate diagnosis and treatment. As an effort toward better understanding the genetic causes of comorbidity, in this work, we have developed a computational method to predict comorbid diseases. Two diseases sharing common genes tend to increase their comorbidity. Previous work shows that after mapping the associated genes onto the human interactome the distance between the two disease modules (subgraphs) is correlated with comorbidity. Methods To fully incorporate structural characteristics of interactome as features into prediction of comorbidity, our method embeds the human interactome into a high dimensional geometric space with weights assigned to the network edges and uses the projection onto different dimension to “fingerprint” disease modules. A supervised machine learning classifier is then trained to discriminate comorbid diseases versus non-comorbid diseases. Results In cross-validation using a benchmark dataset of more than 10,000 disease pairs, we report that our model achieves remarkable performance of ROC score = 0.90 for comorbidity threshold at relative risk RR = 0 and 0.76 for comorbidity threshold at RR = 1, and significantly outperforms the previous method and the interactome generated by annotated data. To further incorporate prior knowledge pathways association with diseases, we weight the protein-protein interaction network edges according to their frequency of occurring in those pathways in such a way that edges with higher frequency will more likely be selected in the minimum spanning tree for geometric embedding. Such weighted embedding is shown to lead to further improvement of comorbid disease prediction. Conclusion The work demonstrates that embedding the two-dimension planar graph of human interactome into a high dimensional geometric space allows for characterizing and capturing disease modules (subgraphs formed by the disease associated genes) from multiple perspectives, and hence provides enriched features for a supervised classifier to discriminate comorbid disease pairs from non-comorbid disease pairs more accurately than based on simply the module separation.

Download Full-text

LeMeDISCO: A computational method for large-scale prediction & molecular interpretation of disease comorbidity

10.1101/2021.06.28.21259559 ◽

2021 ◽

Author(s):

Courtney Alexandra Astore ◽

Hongyi Zhou ◽

Jeffrey Skolnick

Keyword(s):

Coronary Artery Disease ◽

Molecular Basis ◽

Large Scale ◽

Scientific Investigation ◽

Computational Method ◽

Comorbid Disease ◽

Essential Proteins ◽

Molecular Interpretation ◽

Comorbid Diseases ◽

Artery Disease

Often different diseases tend to co-occur (i.e., they are comorbid), which yields the question: what is the molecular basis of their coincidence? Perhaps, common proteins are comorbid disease drivers. To understand the origin of disease comorbidity and to identify the essential proteins and pathways underlying comorbid diseases, we developed LeMeDISCO (Large-Scale Molecular Interpretation of Disease Comorbidity), an algorithm that predicts disease comorbidities from shared mode of action (MOA) proteins predicted by the AI-based MEDICASCY algorithm. LeMeDISCO was applied to predict the general occurrence of comorbid diseases for 3608 distinct diseases. To illustrate the power of LeMeDISCO, we elucidate the possible etiology of coronary artery disease and ovarian cancer by determining the comorbidity enriched MOA proteins and pathways and suggest hypotheses for subsequent scientific investigation. The LeMeDISCO web server is available for academic users at: http://sites.gatech.edu/cssb/LeMeDISCO.

Download Full-text

A Classification Algorithm with Reject Option Based on Adaptive Minimum Spanning Tree Covering Model in High-dimensional Space

JOURNAL OF ELECTRONICS INFORMATION TECHNOLOGY ◽

10.3724/sp.j.1146.2009.00021 ◽

2011 ◽

Vol 32 (12) ◽

pp. 2895-2900 ◽

Cited By ~ 1

Author(s):

Zheng-ping Hu ◽

Cheng-qian Xu ◽

Qian-wen Jia

Keyword(s):

Spanning Tree ◽

Minimum Spanning Tree ◽

Dimensional Space ◽

Classification Algorithm ◽

High Dimensional ◽

High Dimensional Space ◽

Reject Option ◽

Covering Model

Download Full-text

A Fast Two-Level Approximate Euclidean Minimum Spanning Tree Algorithm for High-Dimensional Data

Machine Learning and Data Mining in Pattern Recognition - Lecture Notes in Computer Science ◽

10.1007/978-3-319-96133-0_21 ◽

2018 ◽

pp. 273-287

Author(s):

Xia Li Wang ◽

Xiaochun Wang ◽

Xiaqiong Li

Keyword(s):

Spanning Tree ◽

Minimum Spanning Tree ◽

High Dimensional Data ◽

High Dimensional ◽

Tree Algorithm

Download Full-text

Scalable hierarchical clustering by composition rank vector encoding and tree structure

10.1101/2020.04.12.038026 ◽

2020 ◽

Author(s):

Xiao Lai ◽

Pu Tian

Keyword(s):

Machine Learning ◽

Hierarchical Clustering ◽

Clustering Algorithm ◽

High Dimensional Data ◽

Machine Learning Algorithms ◽

Tree Structure ◽

Supervised Machine Learning ◽

High Dimensional ◽

Rank Vector ◽

Nonlinear Correlations

AbstractSupervised machine learning, especially deep learning based on a wide variety of neural network architectures, have contributed tremendously to fields such as marketing, computer vision and natural language processing. However, development of un-supervised machine learning algorithms has been a bottleneck of artificial intelligence. Clustering is a fundamental unsupervised task in many different subjects. Unfortunately, no present algorithm is satisfactory for clustering of high dimensional data with strong nonlinear correlations. In this work, we propose a simple and highly efficient hierarchical clustering algorithm based on encoding by composition rank vectors and tree structure, and demonstrate its utility with clustering of protein structural domains. No record comparison, which is an expensive and essential common step to all present clustering algorithms, is involved. Consequently, it achieves linear time and space computational complexity hierarchical clustering, thus applicable to arbitrarily large datasets. The key factor in this algorithm is definition of composition, which is dependent upon physical nature of target data and therefore need to be constructed case by case. Nonetheless, the algorithm is general and applicable to any high dimensional data with strong nonlinear correlations. We hope this algorithm to inspire a rich research field of encoding based clustering well beyond composition rank vector trees.

Download Full-text

On the scope and limitations of baker's yeast as a model organism for studying human tissue-specific pathways

10.1101/011858 ◽

2014 ◽

Author(s):

Shahin Mohammadi ◽

Baharak Saberidokht ◽

Shankar Subramaniam ◽

Ananth Grama

Keyword(s):

Drug Targets ◽

Model Organism ◽

Functional Space ◽

Computational Method ◽

Therapeutic Interventions ◽

Human Tissues ◽

Human Interactome ◽

Tissue Specific ◽

Complex Disorders ◽

Human Genes

Budding yeast, S. cerevisiae, has been used extensively as a model organism for studying cellular processes in evolutionarily distant species, including humans. However, different human tissues, while inheriting a similar genetic code, exhibit distinct anatomical and physiological properties. Specific biochemical processes and associated biomolecules that differentiate various tissues are not completely understood, neither is the extent to which a unicellular organism, such as yeast, can be used to model these processes within each tissue. We propose a novel computational and statistical framework to systematically quantify the suitability of yeast as a model organism for different human tissues. We develop a computational method for dissecting the human interactome into tissue-specific cellular networks. Using these networks, we simultaneously partition the functional space of human genes, and their corresponding pathways, based on their conservation both across species and among different tissues. We study these sub-spaces in detail, and relate them to the overall similarity of each tissue with yeast. Many complex disorders are driven by a coupling of housekeeping (universally expressed in all tissues) and tissue-selective (expressed only in specific tissues) dysregulated pathways. We show that human-specific subsets of tissue-selective genes are significantly associated with the onset and development of a number of pathologies. Consequently, they provide excellent candidates as drug targets for therapeutic interventions. We also present a novel tool that can be used to assess the suitability of the yeast model for studying tissue-specific physiology and pathophysiology in humans.

Download Full-text

PREDICTIVE MODELLING AND ANALYTICS FOR DIABETES USING A MACHINE LEARNING APPROACH

INFORMATION TECHNOLOGY IN INDUSTRY ◽

10.17762/itii.v9i1.121 ◽

2021 ◽

Vol 9 (1) ◽

pp. 215-223

Author(s):

Prateek Mishra, Dr.Anurag Sharma, Dr. Abhishek Badholia

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Machine Learning Algorithms ◽

Computational Method ◽

Supervised Machine Learning ◽

Undiagnosed Diabetes ◽

Support Vector ◽

Entire Body ◽

Data Manipulation ◽

Kernel Support Vector Machine

Adverse effects can be seen in the entire body due to the major disorders known as Diabetes. The risk of dangers like diabetic nephropathy, cardiac stroke and other disorders can increase severally because of the undiagnosed diabetes. Around the globe the people are suffering from this disease. For a healthy life early detection of this disease is very curtail. As the causes of the diabetes is increasing rapidly this disease might turn up as a reason for worldwide concern. Increasing the chances for a more accurate predictions and form experiences automatic learning by computational method may be provided by Machine Learning (ML). With the help of R data manipulation tool for trends development and with risk factor patterns detection in Pima Indian diabetes technique of machine learning is been used in the current researches. With the use of R data manipulation tool analysis and development five different predictive models is done for the categorization of patients into diabetic and non- diabetic. supervised machine learning algorithms namely multifactor dimensionality reduction (MDR), k-nearest neighbor (k-NN), artificial neural network (ANN) radial basis function (RBF) kernel support vector machine and linear kernel support vector machine (SVM-linear) are used for this purpose.

Download Full-text

Real-Time Forecast of Influenza Outbreak Using Dynamic Network Marker Based on Minimum Spanning Tree

BioMed Research International ◽

10.1155/2020/7351398 ◽

2020 ◽

Vol 2020 ◽

pp. 1-11

Author(s):

Kun Yang ◽

Jialiu Xie ◽

Rong Xie ◽

Yucong Pan ◽

Rui Liu ◽

...

Keyword(s):

Public Health ◽

Real Time ◽

Spanning Tree ◽

Minimum Spanning Tree ◽

Influenza Pandemic ◽

Social Systems ◽

Computational Method ◽

Global Public Health ◽

Influenza Outbreak ◽

Influenza Outbreaks

The influenza pandemic is a wide-ranging threat to people’s health and property all over the world. Developing effective strategies for predicting the influenza outbreak which may prevent or at least get ready for a new influenza pandemic is now a top global public health priority. Owing to the complexity of influenza outbreaks that are usually involved with spatial and temporal characteristics of both biological and social systems, however, it is a challenging task to achieve the real-time monitoring of influenza outbreaks. In this study, by exploring the rich dynamical information of the city network during influenza outbreaks, we developed a computational method, the minimum-spanning-tree-based dynamical network marker (MST-DNM), to identify the tipping point or critical stage prior to the influenza outbreak. With historical records of influenza outpatients between 2009 and 2018, the MST-DNM strategy has been validated by accurate predictions of the influenza outbreaks in three Japanese cities/regions, respectively, i.e., Tokyo, Osaka, and Hokkaido. These successful applications show that the early-warning signal was detected 4 weeks on average ahead of each influenza outbreak. The results show that our method is of considerable potential in the practice of public health surveillance.

Download Full-text

Combining trajectory optimization, supervised machine learning, and model structure for mitigating the curse of dimensionality in the control of bipedal robots

The International Journal of Robotics Research ◽

10.1177/0278364919859425 ◽

2019 ◽

Vol 38 (9) ◽

pp. 1063-1097 ◽

Cited By ~ 7

Author(s):

Xingye Da ◽

Jessy Grizzle

Keyword(s):

Machine Learning ◽

Trajectory Optimization ◽

Open Loop ◽

Supervised Machine Learning ◽

High Dimensional ◽

Dimensional Model ◽

Bipedal Robots ◽

State Variable ◽

Walking Motion ◽

Low Dimensional

To overcome the obstructions imposed by high-dimensional bipedal models, we embed a stable walking motion in an attractive low-dimensional surface of the system’s state space. The process begins with trajectory optimization to design an open-loop periodic walking motion of the high-dimensional model and then adding to this solution a carefully selected set of additional open-loop trajectories of the model that steer toward the nominal motion. A drawback of trajectories is that they provide little information on how to respond to a disturbance. To address this shortcoming, supervised machine learning is used to extract a low-dimensional state-variable realization of the open-loop trajectories. The periodic orbit is now an attractor of the low-dimensional state-variable model but is not attractive in the full-order system. We then use the special structure of mechanical models associated with bipedal robots to embed the low-dimensional model in the original model in such a manner that the desired walking motions are locally exponentially stable. The design procedure is first developed for ordinary differential equations and illustrated on a simple model. The methods are subsequently extended to a class of hybrid models and then realized experimentally on an Atrias-series 3D bipedal robot.

Download Full-text

Minimum Spanning Tree Fusing Uniform Sub-Sampling Points and High-Dimensional Features for Medical Image Registration

2011 5th International Conference on Bioinformatics and Biomedical Engineering ◽

10.1109/icbbe.2011.5780358 ◽

2011 ◽

Author(s):

Shao-min Zhang ◽

Li-jia Zhi ◽

Da-zhe Zhao ◽

Hong Zhao

Keyword(s):

Image Registration ◽

Spanning Tree ◽

Medical Image ◽

Minimum Spanning Tree ◽

High Dimensional ◽

Medical Image Registration ◽

Sampling Points

Download Full-text

Efficiently mapping structure–property relationships of gas adsorption in porous materials: application to Xe adsorption

Faraday Discussions ◽

10.1039/c7fd00038c ◽

2017 ◽

Vol 201 ◽

pp. 221-232 ◽

Cited By ~ 4

Author(s):

A. R. Kaija ◽

C. E. Wilmer

Keyword(s):

Porous Materials ◽

Void Fraction ◽

Large Scale ◽

Gas Adsorption ◽

Structural Characteristics ◽

Gas Storage ◽

Computational Method ◽

Structure Property ◽

Structure Property Relationships ◽

Computational Screening

Designing better porous materials for gas storage or separations applications frequently leverages known structure–property relationships. Reliable structure–property relationships, however, only reveal themselves when adsorption data on many porous materials are aggregated and compared. Gathering enough data experimentally is prohibitively time consuming, and even approaches based on large-scale computer simulations face challenges. Brute force computational screening approaches that do not efficiently sample the space of porous materials may be ineffective when the number of possible materials is too large. Here we describe a general and efficient computational method for mapping structure–property spaces of porous materials that can be useful for adsorption related applications. We describe an algorithm that generates random porous “pseudomaterials”, for which we calculate structural characteristics (e.g., surface area, pore size and void fraction) and also gas adsorption properties via molecular simulations. Here we chose to focus on void fraction and Xe adsorption at 1 bar, 5 bar, and 10 bar. The algorithm then identifies pseudomaterials with rare combinations of void fraction and Xe adsorption and mutates them to generate new pseudomaterials, thereby selectively adding data only to those parts of the structure–property map that are the least explored. Use of this method can help guide the design of new porous materials for gas storage and separations applications in the future.

Download Full-text