scholarly journals Network inference with ensembles of bi-clustering trees

2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Konstantinos Pliakos ◽  
Celine Vens

Abstract Background Network inference is crucial for biomedicine and systems biology. Biological entities and their associations are often modeled as interaction networks. Examples include drug protein interaction or gene regulatory networks. Studying and elucidating such networks can lead to the comprehension of complex biological processes. However, usually we have only partial knowledge of those networks and the experimental identification of all the existing associations between biological entities is very time consuming and particularly expensive. Many computational approaches have been proposed over the years for network inference, nonetheless, efficiency and accuracy are still persisting open problems. Here, we propose bi-clustering tree ensembles as a new machine learning method for network inference, extending the traditional tree-ensemble models to the global network setting. The proposed approach addresses the network inference problem as a multi-label classification task. More specifically, the nodes of a network (e.g., drugs or proteins in a drug-protein interaction network) are modelled as samples described by features (e.g., chemical structure similarities or protein sequence similarities). The labels in our setting represent the presence or absence of links connecting the nodes of the interaction network (e.g., drug-protein interactions in a drug-protein interaction network). Results We extended traditional tree-ensemble methods, such as extremely randomized trees (ERT) and random forests (RF) to ensembles of bi-clustering trees, integrating background information from both node sets of a heterogeneous network into the same learning framework. We performed an empirical evaluation, comparing the proposed approach to currently used tree-ensemble based approaches as well as other approaches from the literature. We demonstrated the effectiveness of our approach in different interaction prediction (network inference) settings. For evaluation purposes, we used several benchmark datasets that represent drug-protein and gene regulatory networks. We also applied our proposed method to two versions of a chemical-protein association network extracted from the STITCH database, demonstrating the potential of our model in predicting non-reported interactions. Conclusions Bi-clustering trees outperform existing tree-based strategies as well as machine learning methods based on other algorithms. Since our approach is based on tree-ensembles it inherits the advantages of tree-ensemble learning, such as handling of missing values, scalability and interpretability.

2020 ◽  
Vol 21 (11) ◽  
pp. 1054-1059
Author(s):  
Bin Yang ◽  
Yuehui Chen

: Reconstruction of gene regulatory networks (GRN) plays an important role in understanding the complexity, functionality and pathways of biological systems, which could support the design of new drugs for diseases. Because differential equation models are flexible androbust, these models have been utilized to identify biochemical reactions and gene regulatory networks. This paper investigates the differential equation models for reverse engineering gene regulatory networks. We introduce three kinds of differential equation models, including ordinary differential equation (ODE), time-delayed differential equation (TDDE) and stochastic differential equation (SDE). ODE models include linear ODE, nonlinear ODE and S-system model. We also discuss the evolutionary algorithms, which are utilized to search the optimal structures and parameters of differential equation models. This investigation could provide a comprehensive understanding of differential equation models, and lead to the discovery of novel differential equation models.


Author(s):  
Tsuyoshi Kato ◽  
Kinya Okada ◽  
Hisashi Kashima ◽  
Masashi Sugiyama

The authors’ algorithm was favorably examined on two kinds of biological networks: a metabolic network and a protein interaction network. A statistical test confirmed that the weight that our algorithm assigned to each assay was meaningful.


2021 ◽  
Author(s):  
Nikoleta Vavouraki ◽  
James E. Tomkins ◽  
Eleanna Kara ◽  
Henry Houlden ◽  
John Hardy ◽  
...  

AbstractThe Hereditary Spastic Paraplegias are a group of neurodegenerative diseases characterized by spasticity and weakness in the lower body. Despite the identification of causative mutations in over 70 genes, the molecular aetiology remains unclear. Due to the combination of genetic diversity and variable clinical presentation, the Hereditary Spastic Paraplegias are a strong candidate for protein-protein interaction network analysis as a tool to understand disease mechanism(s) and to aid functional stratification of phenotypes. In this study, experimentally validated human protein-protein interactions were used to create a protein-protein interaction network based on the causative Hereditary Spastic Paraplegia genes. Network evaluation as a combination of both topological analysis and functional annotation led to the identification of core proteins in putative shared biological processes such as intracellular transport and vesicle trafficking. The application of machine learning techniques suggested a functional dichotomy linked with distinct sets of clinical presentations, suggesting there is scope to further classify conditions currently described under the same umbrella term of Hereditary Spastic Paraplegias based on specific molecular mechanisms of disease.


2021 ◽  
Vol 12 ◽  
Author(s):  
Jiyoung Lee ◽  
Shuo Geng ◽  
Song Li ◽  
Liwu Li

Subclinical doses of LPS (SD-LPS) are known to cause low-grade inflammatory activation of monocytes, which could lead to inflammatory diseases including atherosclerosis and metabolic syndrome. Sodium 4-phenylbutyrate is a potential therapeutic compound which can reduce the inflammation caused by SD-LPS. To understand the gene regulatory networks of these processes, we have generated scRNA-seq data from mouse monocytes treated with these compounds and identified 11 novel cell clusters. We have developed a machine learning method to integrate scRNA-seq, ATAC-seq, and binding motifs to characterize gene regulatory networks underlying these cell clusters. Using guided regularized random forest and feature selection, our method achieved high performance and outperformed a traditional enrichment-based method in selecting candidate regulatory genes. Our method is particularly efficient in selecting a few candidate genes to explain observed expression pattern. In particular, among 531 candidate TFs, our method achieves an auROC of 0.961 with only 10 motifs. Finally, we found two novel subpopulations of monocyte cells in response to SD-LPS and we confirmed our analysis using independent flow cytometry experiments. Our results suggest that our new machine learning method can select candidate regulatory genes as potential targets for developing new therapeutics against low grade inflammation.


Author(s):  
Jie Zhao ◽  
Hongjie Gao ◽  
Yun He

Background: Epithelial ovarian carcinoma (EOC) is a ubiquitous gynecological malignancy with complicated pathogenesis. Genetic risk factors and pathways involved in the prognosis of this cancer are not yet understood completely. Determining genetic markers with diagnostic and prognostic values would pave the way for efficient management of cancer. Objective: This study aimed to investigate the genes and the regulatory networks involved in the occurrence and prognosis of EOC through different bioinformatics analysis tools. In addition, recent advances in using bioinformatic analysis approach based on the genes and regulatory networks, particularly differentially expressed genes (DEGs), in improving the diagnosis and prognosis of EOC are discussed. Methods: The gene expression profiles of GSE18520, GSE54388, and GSE27651 were downloaded from the Gene Expression Omnibus (GEO) database and further analyzed with different analyses in R language. Current literature on using bioinformatics based on DEGs and associated regulatory networks to improve the diagnosis and prognosis of EOC were reviewed. Results: Analyses of the gene expression levels between the malignant tissue against normal tissue unveiled 163 DEGs. Gene Ontology (GO) annotation and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses were performed on the target genes using clusterProfiler package, and Cytoscape package was employed to assess the protein interaction network of these genes. The protein-protein interaction network was analyzed using the CytoHubba plug-in to identify 20 hub genes. In addition, we analyzed the prognosis of the hub genes using the Kaplan-Meier survival analysis that revealed evident differences in the prognosis of 13 genes. The malignant tissues exhibited a differential expression of 12 genes against healthy tissues, as shown by Gene Expression Profiling Interactive Analysis (GEPIA) analysis. Conclusion: Findings of this study revealed 12 genes to be significantly up-regulated, and the prognosis was significantly different, which could be employed to potentially target EOC in clinical practice.


Sign in / Sign up

Export Citation Format

Share Document