Biological knowledge-slanted random forest approach for the classification of calcified aortic valve stenosis

Abstract Background: Calcific aortic valve stenosis (CAVS) is a fatal disease and there is no pharmacological treatment to prevent the progression of CAVS. This study aims to identify genes potentially implicated with CAVS in patients with congenital bicuspid aortic valve (BAV) and tricuspid aortic valve (TAV) in comparison with normal valves, using a knowledge-slanted random forest (RF). Results: This study implemented a knowledge-slanted random forest (RF) using information extracted from a protein-protein interactions network to rank genes in order to modify the selection probability of them to draw the candidate split-variables. A total of 1591 genes were assessed in 19 valves with CAVS (BAV, n=10; TAV, n=9) and 8 normal valves. The performance of the model was evaluated using accuracy, sensitivity, and specificity to discriminate cases with CAVS. A comparison with conventional RF was also performed. The performance of this proposed approach reported better accuracy in comparison with conventional RF to classify cases separately with BAV and TAV (Slanted RF: 59.3% versus 40.7%). When patients with BAV and TAV were grouped against patients with normal valves, the addition of prior biological information was not relevant with an accuracy of 92.6%.Conclusion: The knowledge-slanted RF approach reflected prior biological knowledge, leading to better precision in distinguishing between cases with BAV, TAV, and normal valves. The results of this study suggest that the integration of biological knowledge can be useful during difficult classification tasks.

Download Full-text

Biological knowledge-slanted random forest approach for the classification of calcified aortic valve stenosis

BioData Mining ◽

10.1186/s13040-021-00269-4 ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Erika Cantor ◽

Rodrigo Salas ◽

Harvey Rosas ◽

Sandra Guauque-Olarte

Keyword(s):

Random Forest ◽

Aortic Valve ◽

Aortic Valve Stenosis ◽

Protein Interactions ◽

Biological Information ◽

Biological Knowledge ◽

Protein Protein Interactions ◽

Selection Probability ◽

Prior Biological Knowledge ◽

Improved Accuracy

Abstract Background Calcific aortic valve stenosis (CAVS) is a fatal disease and there is no pharmacological treatment to prevent the progression of CAVS. This study aims to identify genes potentially implicated with CAVS in patients with congenital bicuspid aortic valve (BAV) and tricuspid aortic valve (TAV) in comparison with patients having normal valves, using a knowledge-slanted random forest (RF). Results This study implemented a knowledge-slanted random forest (RF) using information extracted from a protein-protein interactions network to rank genes in order to modify their selection probability to draw the candidate split-variables. A total of 15,191 genes were assessed in 19 valves with CAVS (BAV, n = 10; TAV, n = 9) and 8 normal valves. The performance of the model was evaluated using accuracy, sensitivity, and specificity to discriminate cases with CAVS. A comparison with conventional RF was also performed. The performance of this proposed approach reported improved accuracy in comparison with conventional RF to classify cases separately with BAV and TAV (Slanted RF: 59.3% versus 40.7%). When patients with BAV and TAV were grouped against patients with normal valves, the addition of prior biological information was not relevant with an accuracy of 92.6%. Conclusion The knowledge-slanted RF approach reflected prior biological knowledge, leading to better precision in distinguishing between cases with BAV, TAV, and normal valves. The results of this study suggest that the integration of biological knowledge can be useful during difficult classification tasks.

Download Full-text

Structural Learning of Genetic Regulatory Networks Based on Prior Biological Knowledge and Microarray Gene Expression Measurements

Handbook of Research on Computational Methodologies in Gene Regulatory Networks ◽

10.4018/978-1-60566-685-3.ch012 ◽

2010 ◽

pp. 289-309

Author(s):

Yang Dai ◽

Eyad Almasri ◽

Peter Larsen ◽

Guanrao Chen

Keyword(s):

Gene Expression ◽

Protein Interactions ◽

Regulatory Networks ◽

Genetic Regulatory Networks ◽

Biological Knowledge ◽

Protein Protein Interactions ◽

Microarray Gene Expression ◽

Prior Biological Knowledge ◽

Comprehensive Survey ◽

Microarray Gene

The reconstruction of genetic regulatory networks from microarray gene expression measurements has been a challenging problem in bioinformatics. Various methods have been proposed for this problem including the Bayesian Network (BN) approach. In this chapter, we provide a comprehensive survey of the current development of using structure priors derived from high-throughput experimental results such as protein-protein interactions, transcription factor binding location data, evolutionary relationships, and literature database in learning regulatory networks.

Download Full-text

Prediction of Disease Comorbidity Using HeteSim Scores based on Multiple Heterogeneous Networks

Current Gene Therapy ◽

10.2174/1566523219666190917155959 ◽

2019 ◽

Vol 19 (4) ◽

pp. 232-241 ◽

Cited By ~ 5

Author(s):

Xuegong Chen ◽

Wanwan Shi ◽

Lei Deng

Keyword(s):

Protein Interactions ◽

Experimental Studies ◽

Treatment Strategies ◽

Computational Method ◽

Biological Information ◽

Support Vector ◽

Protein Protein Interactions ◽

Efficient Treatment ◽

Disease Associations ◽

Previous State

Background: Accumulating experimental studies have indicated that disease comorbidity causes additional pain to patients and leads to the failure of standard treatments compared to patients who have a single disease. Therefore, accurate prediction of potential comorbidity is essential to design more efficient treatment strategies. However, only a few disease comorbidities have been discovered in the clinic. Objective: In this work, we propose PCHS, an effective computational method for predicting disease comorbidity. Materials and Methods: We utilized the HeteSim measure to calculate the relatedness score for different disease pairs in the global heterogeneous network, which integrates six networks based on biological information, including disease-disease associations, drug-drug interactions, protein-protein interactions and associations among them. We built the prediction model using the Support Vector Machine (SVM) based on the HeteSim scores. Results and Conclusion: The results showed that PCHS performed significantly better than previous state-of-the-art approaches and achieved an AUC score of 0.90 in 10-fold cross-validation. Furthermore, some of our predictions have been verified in literatures, indicating the effectiveness of our method.

Download Full-text

A MapReduce-Based Parallel Random Forest Approach for Predicting Large-Scale Protein-Protein Interactions

Intelligent Computing Methodologies - Lecture Notes in Computer Science ◽

10.1007/978-3-030-60796-8_34 ◽

2020 ◽

pp. 400-407

Author(s):

Bo-Ya Ji ◽

Zhu-Hong You ◽

Long Yang ◽

Ji-Ren Zhou ◽

Peng-Wei Hu

Keyword(s):

Random Forest ◽

Protein Interactions ◽

Large Scale ◽

Protein Protein Interactions

Download Full-text

Practical Model Selection for Prospective Virtual Screening

10.1101/337956 ◽

2018 ◽

Cited By ~ 1

Author(s):

Shengchao Liu ◽

Moayad Alnammi ◽

Spencer S. Ericksen ◽

Andrew F. Voter ◽

Gene E. Ananiev ◽

...

Keyword(s):

Random Forest ◽

Virtual Screening ◽

Protein Interactions ◽

High Throughput Screening ◽

Screening Methods ◽

Protein Protein Interactions ◽

Screening Algorithm ◽

Screening Performance ◽

Wide Range ◽

Public Datasets

AbstractVirtual (computational) high-throughput screening provides a strategy for prioritizing compounds for experimental screens, but the choice of virtual screening algorithm depends on the dataset and evaluation strategy. We consider a wide range of ligand-based machine learning and docking-based approaches for virtual screening on two protein-protein interactions, PriA-SSB and RMI-FANCM, and present a strategy for choosing which algorithm is best for prospective compound prioritization. Our workflow identifies a random forest as the best algorithm for these targets over more sophisticated neural network-based models. The top 250 predictions from our selected random forest recover 37 of the 54 active compounds from a library of 22,434 new molecules assayed on PriA-SSB. We show that virtual screening methods that perform well in public datasets and synthetic benchmarks, like multi-task neural networks, may not always translate to prospective screening performance on a specific assay of interest.

Download Full-text

Incorporating Biological Information as a Prior in an Empirical Bayes Approach to Analyzing Microarray Data

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1124 ◽

2005 ◽

Vol 4 (1) ◽

Cited By ~ 6

Author(s):

Wei Pan

Keyword(s):

Mixture Model ◽

Microarray Data ◽

Statistical Power ◽

Empirical Bayes ◽

Biological Information ◽

Biological Knowledge ◽

False Discovery Rates ◽

Proteomic Data ◽

Prior Biological Knowledge ◽

A Genome

Currently the practice of using existing biological knowledge in analyzing high throughput genomic and proteomic data is mainly for the purpose of validations. Here we take a different approach of incorporating biological knowledge into statistical analysis to improve statistical power and efficiency. Specifically, we consider how to fuse biological information into a mixture model to analyze microarray data. In contrast to a standard mixture model where it is assumed that all the genes come from the same (marginal) distribution, including an equal prior probability of having an event, such as having differential expression or being bound by a transcription factor (TF), our proposed mixture model allows the genes in different groups to have different distributions while the grouping of the genes reflects biological information. Using a list of about 800 putative cell cycle-regulated genes as prior biological knowledge, we analyze a genome-wide location data to detect binding sites of TF Fkh1. We find that our proposal improves over the standard approach, resulting in reduced false discovery rates (FDR), and hence it is a useful alternative to the current practice.

Download Full-text

Structural Classification of Bacterial Response Regulators: Diversity of Output Domains and Domain Combinations

Journal of Bacteriology ◽

10.1128/jb.01887-05 ◽

2006 ◽

Vol 188 (12) ◽

pp. 4169-4182 ◽

Cited By ~ 315

Author(s):

Michael Y. Galperin

Keyword(s):

Protein Interactions ◽

Rna Binding ◽

Transcriptional Regulators ◽

Bacterial Cells ◽

Protein Protein Interactions ◽

Response Regulators ◽

Dna Binding Domains ◽

Binding Domains ◽

Archaeal Species

ABSTRACT CheY-like phosphoacceptor (or receiver [REC]) domain is a common module in a variety of response regulators of the bacterial signal transduction systems. In this work, 4,610 response regulators, encoded in complete genomes of 200 bacterial and archaeal species, were identified and classified by their domain architectures. Previously uncharacterized output domains were analyzed and, in some cases, assigned to known domain families. Transcriptional regulators of the OmpR, NarL, and NtrC families were found to comprise almost 60% of all response regulators; transcriptional regulators with other DNA-binding domains (LytTR, AraC, Spo0A, Fis, YcbB, RpoE, and MerR) account for an additional 6%. The remaining one-third is represented by the stand-alone REC domain (∼14%) and its combinations with a variety of enzymatic (GGDEF, EAL, HD-GYP, CheB, CheC, PP2C, and HisK), RNA-binding (ANTAR and CsrA), protein- or ligand-binding (PAS, GAF, TPR, CAP_ED, and HPt) domains, or newly described domains of unknown function. The diversity of domain architectures and the abundance of alternative domain combinations suggest that fusions between the REC domain and various output domains is a widespread evolutionary mechanism that allows bacterial cells to regulate transcription, enzyme activity, and/or protein-protein interactions in response to environmental challenges. The complete list of response regulators encoded in each of the 200 analyzed genomes is available online at http://www.ncbi.nlm.nih.gov/Complete_Genomes/RRcensus.html .

Download Full-text

A Novel Network-Based Algorithm for Predicting Protein-Protein Interactions Using Gene Ontology

Frontiers in Microbiology ◽

10.3389/fmicb.2021.735329 ◽

2021 ◽

Vol 12 ◽

Author(s):

Lun Hu ◽

Xiaojuan Wang ◽

Yu-An Huang ◽

Pengwei Hu ◽

Zhu-Hong You

Keyword(s):

Gene Ontology ◽

Protein Interactions ◽

Structural Information ◽

Computational Prediction ◽

Biological Information ◽

Main Role ◽

Protein Protein Interactions ◽

Chronic Infections ◽

Prediction Algorithms ◽

Ppi Networks

Proteins are one of most significant components in living organism, and their main role in cells is to undertake various physiological functions by interacting with each other. Thus, the prediction of protein-protein interactions (PPIs) is crucial for understanding the molecular basis of biological processes, such as chronic infections. Given the fact that laboratory-based experiments are normally time-consuming and labor-intensive, computational prediction algorithms have become popular at present. However, few of them could simultaneously consider both the structural information of PPI networks and the biological information of proteins for an improved accuracy. To do so, we assume that the prior information of functional modules is known in advance and then simulate the generative process of a PPI network associated with the biological information of proteins, i.e., Gene Ontology, by using an established Bayesian model. In order to indicate to what extent two proteins are likely to interact with each other, we propose a novel scoring function by combining the membership distributions of proteins with network paths. Experimental results show that our algorithm has a promising performance in terms of several independent metrics when compared with state-of-the-art prediction algorithms, and also reveal that the consideration of modularity in PPI networks provides us an alternative, yet much more flexible, way to accurately predict PPIs.

Download Full-text

Inferring cell cycle phases from a partially temporal network of protein interactions

10.1101/2021.03.26.437187 ◽

2021 ◽

Author(s):

Maxime Lucas ◽

Arthur Morris ◽

Alex Townsend-Teague ◽

Laurent Tichit ◽

Bianca H Habermann ◽

...

Keyword(s):

Cell Cycle ◽

Protein Interactions ◽

Time Series Data ◽

A Priori ◽

Biological Systems ◽

Series Data ◽

Biological Knowledge ◽

Temporal Network ◽

Protein Protein Interactions ◽

Temporal Organisation

The temporal organisation of biological systems into phases and subphases is often crucial to their functioning. Identifying this multiscale organisation can yield insight into the underlying biological mechanisms at play. To date, however, this identification requires a priori biological knowledge of the system under study. Here, we recover the temporal organisation of the cell cycle of budding yeast into phases and subphases, in an automated way. To do so, we model the cell cycle as a partially temporal network of protein-protein interactions (PPIs) by combining a traditional static PPI network with protein concentration or RNA expression time series data. Then, we cluster the snapshots of this temporal network to infer phases, which show good agreement with our biological knowledge of the cell cycle. We systematically test the robustness of the approach and investigate the effect of having only partial temporal information. The generality of the method makes it suitable for application to other, less well-known biological systems for which the temporal organisation of processes plays an important role.

Download Full-text

Prediction of protein-protein interactions using one-class classification methods and integrating diverse biological data

Journal of Integrative Bioinformatics ◽

10.1515/jib-2007-77 ◽

2007 ◽

Vol 4 (3) ◽

pp. 208-223 ◽

Cited By ~ 5

Author(s):

José A. Reyes ◽

David Gilbert

Keyword(s):

Performance Evaluation ◽

Protein Interactions ◽

Biological Data ◽

Biological Information ◽

Machine Learning Techniques ◽

Protein Protein Interactions ◽

Comparative Performance ◽

Learning Techniques ◽

Imbalanced Class ◽

One Class Classification

Summary This research addresses the problem of prediction of protein-protein interactions (PPI) when integrating diverse kinds of biological information. This task has been commonly viewed as a binary classification problem (whether any two proteins do or do not interact) and several different machine learning techniques have been employed to solve this task. However the nature of the data creates two major problems which can affect results. These are firstly imbalanced class problems due to the number of positive examples (pairs of proteins which really interact) being much smaller than the number of negative ones. Secondly the selection of negative examples can be based on some unreliable assumptions which could introduce some bias in the classification results.Here we propose the use of one-class classification (OCC) methods to deal with the task of prediction of PPI. OCC methods utilise examples of just one class to generate a predictive model which consequently is independent of the kind of negative examples selected; additionally these approaches are known to cope with imbalanced class problems. We have designed and carried out a performance evaluation study of several OCC methods for this task, and have found that the Parzen density estimation approach outperforms the rest. We also undertook a comparative performance evaluation between the Parzen OCC method and several conventional learning techniques, considering different scenarios, for example varying the number of negative examples used for training purposes. We found that the Parzen OCC method in general performs competitively with traditional approaches and in many situations outperforms them. Finally we evaluated the ability of the Parzen OCC approach to predict new potential PPI targets, and validated these results by searching for biological evidence in the literature.

Download Full-text