scholarly journals Integration of anatomy ontology data with protein-protein interaction networks improves the candidate gene prediction accuracy for anatomical entities

2020 ◽  
Author(s):  
Pasan Chinthana Fernando ◽  
Paula M Mabee ◽  
Erliang Zeng

AbstractBackgroundIdentification of genes responsible for anatomical entities is a major requirement in many fields including developmental biology, medicine, and agriculture. Current wet-lab techniques used for this purpose, such as gene knockout, are high in resource and time consumption. Protein-protein interaction (PPI) networks are frequently used to predict disease genes for humans and gene candidates for molecular functions, but they are rarely used to predict genes for anatomical entities. This is because PPI networks suffer from network quality issues, which can be a limitation for their usage in predicting candidate genes for anatomical entities. We developed an integrative framework to predict candidate genes for anatomical entities by combining existing experimental knowledge about gene-anatomy relationships with PPI networks using anatomy ontology annotations. We expected this integration to improve the quality of the PPI networks and be better optimized to predict candidate genes for anatomical entities. We used existing Uberon anatomy entity annotations for zebrafish and mouse genes to construct gene networks by calculating semantic similarity between the genes. These ‘anatomy-based gene networks’ are semantic networks, as they are constructed based on the Uberon anatomy ontology annotations that are obtained from the experimental data in the literature. We integrated these anatomy-based gene networks with mouse and zebrafish PPI networks retrieved from the STRING database, and we compared the performance of their network-based candidate gene predictions.ResultsAccording to candidate gene prediction performance evaluations tested under four different semantic similarity calculation methods (Lin, Resnik, Schlicker, and Wang), the integrated networks showed better receiver operating characteristic (ROC) and precision-recall curve performances than PPI networks for both zebrafish and mouse.ConclusionIntegration of existing experimental knowledge about gene-anatomical entity relationships with PPI networks via anatomy ontology improves the network quality, which makes them better optimized for predicting candidate genes for anatomical entities.

2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Pasan C. Fernando ◽  
Paula M. Mabee ◽  
Erliang Zeng

Abstract Background Identification of genes responsible for anatomical entities is a major requirement in many fields including developmental biology, medicine, and agriculture. Current wet lab techniques used for this purpose, such as gene knockout, are high in resource and time consumption. Protein–protein interaction (PPI) networks are frequently used to predict disease genes for humans and gene candidates for molecular functions, but they are rarely used to predict genes for anatomical entities. Moreover, PPI networks suffer from network quality issues, which can be a limitation for their usage in predicting candidate genes. Therefore, we developed an integrative framework to improve the candidate gene prediction accuracy for anatomical entities by combining existing experimental knowledge about gene-anatomical entity relationships with PPI networks using anatomy ontology annotations. We hypothesized that this integration improves the quality of the PPI networks by reducing the number of false positive and false negative interactions and is better optimized to predict candidate genes for anatomical entities. We used existing Uberon anatomical entity annotations for zebrafish and mouse genes to construct gene networks by calculating semantic similarity between the genes. These anatomy-based gene networks were semantic networks, as they were constructed based on the anatomy ontology annotations that were obtained from the experimental data in the literature. We integrated these anatomy-based gene networks with mouse and zebrafish PPI networks retrieved from the STRING database and compared the performance of their network-based candidate gene predictions. Results According to evaluations of candidate gene prediction performance tested under four different semantic similarity calculation methods (Lin, Resnik, Schlicker, and Wang), the integrated networks, which were semantically improved PPI networks, showed better performances by having higher area under the curve values for receiver operating characteristic and precision-recall curves than PPI networks for both zebrafish and mouse. Conclusion Integration of existing experimental knowledge about gene-anatomical entity relationships with PPI networks via anatomy ontology improved the candidate gene prediction accuracy and optimized them for predicting candidate genes for anatomical entities.


2021 ◽  
Author(s):  
Yuhan Xie ◽  
Wei Jiang ◽  
Weilai Dong ◽  
Hongyu Li ◽  
Sheng Chih Jin ◽  
...  

De novo variants (DNVs) with deleterious effects have proved informative in identifying risk genes for early-onset diseases such as congenital heart disease (CHD). A number of statistical methods have been proposed for family-based studies or case/control studies to identify risk genes by screening genes with more DNVs than expected by chance in Whole Exome Sequencing (WES) studies. However, the statistical power is still limited for cohorts with thousands of subjects. Under the hypothesis that connected genes in protein-protein interaction (PPI) networks are more likely to share similar disease association status, we develop a Markov Random Field model that can leverage information from publicly available PPI databases to increase power in identifying risk genes. We identified 46 candidate genes with at least 1 DNV in the CHD study cohort, including 18 known human CHD genes and 35 highly expressed genes in mouse developing heart. Our results may shed new insight on the shared protein functionality among risk genes for CHD.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Suthanthiram Backiyarani ◽  
Rajendran Sasikala ◽  
Simeon Sharmiladevi ◽  
Subbaraya Uma

AbstractBanana, one of the most important staple fruit among global consumers is highly sterile owing to natural parthenocarpy. Identification of genetic factors responsible for parthenocarpy would facilitate the conventional breeders to improve the seeded accessions. We have constructed Protein–protein interaction (PPI) network through mining differentially expressed genes and the genes used for transgenic studies with respect to parthenocarpy. Based on the topological and pathway enrichment analysis of proteins in PPI network, 12 candidate genes were shortlisted. By further validating these candidate genes in seeded and seedless accession of Musa spp. we put forward MaAGL8, MaMADS16, MaGH3.8, MaMADS29, MaRGA1, MaEXPA1, MaGID1C, MaHK2 and MaBAM1 as possible target genes in the study of natural parthenocarpy. In contrary, expression profile of MaACLB-2 and MaZEP is anticipated to highlight the difference in artificially induced and natural parthenocarpy. By exploring the PPI of validated genes from the network, we postulated a putative pathway that bring insights into the significance of cytokinin mediated CLAVATA(CLV)–WUSHEL(WUS) signaling pathway in addition to gibberellin mediated auxin signaling in parthenocarpy. Our analysis is the first attempt to identify candidate genes and to hypothesize a putative mechanism that bridges the gaps in understanding natural parthenocarpy through PPI network.


2014 ◽  
Vol 12 (01) ◽  
pp. 1450004 ◽  
Author(s):  
SLAVKA JAROMERSKA ◽  
PETR PRAUS ◽  
YOUNG-RAE CHO

Reconstruction of signaling pathways is crucial for understanding cellular mechanisms. A pathway is represented as a path of a signaling cascade involving a series of proteins to perform a particular function. Since a protein pair involved in signaling and response have a strong interaction, putative pathways can be detected from protein–protein interaction (PPI) networks. However, predicting directed pathways from the undirected genome-wide PPI networks has been challenging. We present a novel computational algorithm to efficiently predict signaling pathways from PPI networks given a starting protein and an ending protein. Our approach integrates topological analysis of PPI networks and semantic analysis of PPIs using Gene Ontology data. An advanced semantic similarity measure is used for weighting each interacting protein pair. Our distance-wise algorithm iteratively selects an adjacent protein from a PPI network to build a pathway based on a distance condition. On each iteration, the strength of a hypothetical path passing through a candidate edge is estimated by a local heuristic. We evaluate the performance by comparing the resultant paths to known signaling pathways on yeast. The results show that our approach has higher accuracy and efficiency than previous methods.


mSphere ◽  
2019 ◽  
Vol 4 (5) ◽  
Author(s):  
Sriparna Mukherjee ◽  
Irshad Akbar ◽  
Reshma Bhagat ◽  
Bibhabasu Hazra ◽  
Arindam Bhattacharyya ◽  
...  

ABSTRACT RNA viruses are known to modulate host microRNA (miRNA) machinery for their own benefit. Japanese encephalitis virus (JEV), a neurotropic RNA virus, has been reported to manipulate several miRNAs in neurons or microglia. However, no report indicates a complete sketch of the miRNA profile of neural stem/progenitor cells (NSPCs), hence the focus of our current study. We used an miRNA array of 84 miRNAs in uninfected and JEV-infected human neuronal progenitor cells and primary neural precursor cells isolated from aborted fetuses. Severalfold downregulation of hsa-miR-9-5p, hsa-miR-22-3p, hsa-miR-124-3p, and hsa-miR-132-3p was found postinfection in both of the cell types compared to the uninfected cells. Subsequently, we screened for the target genes of these miRNAs and looked for the biological pathways that were significantly regulated by the genes. The target genes involved in two or more pathways were sorted out. Protein-protein interaction (PPI) networks of the miRNA target genes were formed based on their interaction patterns. A binary adjacency matrix for each gene network was prepared. Different modules or communities were identified in those networks by community detection algorithms. Mathematically, we identified the hub genes by analyzing their degree centrality and participation coefficient in the network. The hub genes were classified as either provincial (P < 0.4) or connector (P > 0.4) hubs. We validated the expression of hub genes in both cell line and primary cells through qRT-PCR after JEV infection and respective miR mimic transfection. Taken together, our findings highlight the importance of specific target gene networks of miRNAs affected by JEV infection in NSPCs. IMPORTANCE JEV damages the neural stem/progenitor cell population of the mammalian brain. However, JEV-induced alteration in the miRNA expression pattern of the cell population remains an open question, hence warranting our present study. In this study, we specifically address the downregulation of four miRNAs, and we prepared a protein-protein interaction network of miRNA target genes. We identified two types of hub genes in the PPI network, namely, connector hubs and provincial hubs. These two types of miRNA target hub genes critically influence the participation strength in the networks and thereby significantly impact up- and downregulation in several key biological pathways. Computational analysis of the PPI networks identifies key protein interactions and hubs in those modules, which opens up the possibility of precise identification and classification of host factors for viral infection in NSPCs.


2021 ◽  
Author(s):  
Zhihong Zhang ◽  
Sai Hu ◽  
Wei Yan ◽  
Bihai Zhao ◽  
Lei Wang

Abstract BackgroundIdentification of essential proteins is very important for understanding the basic requirements to sustain a living organism. In recent years, various different computational methods have been proposed to identify essential proteins based on protein-protein interaction (PPI) networks. However, there has been reliable evidence that a huge amount of false negatives and false positives exist in PPI data. Therefore, it is necessary to reduce the influence of false data on accuracy of essential proteins prediction by integrating multi-source biological information with PPI networks.ResultsIn this paper, we proposed a non-negative matrix factorization and multiple biological information based model (NDM) for identifying essential proteins. The first stage in this progress was to construct a weighted PPI network by combing the information of protein domain, protein complex and the topology characteristic of the original PPI network. Then, the non-negative matrix factorization technique was used to reconstruct an optimized PPI network with whole enough weight of edges. In the final stage, the ranking score of each protein was computed by the PageRank algorithm in which the initial scores were calculated with homologous and subcellular localization information. In order to verify the effectiveness of the NDM method, we compared the NDM with other state-of-the-art essential proteins prediction methods. The comparison of the results obtained from different methods indicated that our NDM model has better performance in predicting essential proteins.ConclusionEmploying the non-negative matrix factorization and integrating multi-source biological data can effectively improve quality of the PPI network, which resulted in the led to optimization of the performance essential proteins identification. This will also provide a new perspective for other prediction based on protein-protein interaction networks.


F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 1969
Author(s):  
Dongmin Jung ◽  
Xijin Ge

Interactions between proteins occur in many, if not most, biological processes. This fact has motivated the development of a variety of experimental methods for the identification of protein-protein interaction (PPI) networks. Leveraging PPI data available STRING database, we use network-based statistical learning methods to infer the putative functions of proteins from the known functions of neighboring proteins on a PPI network. This package identifies such proteins often involved in the same or similar biological functions. The package is freely available at the Bioconductor web site (http://bioconductor.org/packages/PPInfer/).


F1000Research ◽  
2018 ◽  
Vol 6 ◽  
pp. 1969 ◽  
Author(s):  
Dongmin Jung ◽  
Xijin Ge

Interactions between proteins occur in many, if not most, biological processes. This fact has motivated the development of a variety of experimental methods for the identification of protein-protein interaction (PPI) networks. Leveraging PPI data available in the STRING database, we use a network-based statistical learning methods to infer the putative functions of proteins from the known functions of neighboring proteins on a PPI network. This package identifies such proteins often involved in the same or similar biological functions. The package is freely available at the Bioconductor web site (http://bioconductor.org/packages/PPInfer/).


2020 ◽  
Author(s):  
Halima Alachram ◽  
Hryhorii Chereda ◽  
Tim Beißbarth ◽  
Edgar Wingender ◽  
Philip Stegmaier

AbstractBiomedical and life science literature is an essential way to publish experimental results. With the rapid growth of the number of new publications, the amount of scientific knowledge represented in free text is increasing remarkably. There has been much interest in developing techniques that can extract this knowledge and make it accessible to aid scientists in discovering new relationships between biological entities and answering biological questions. Making use of the word2vec approach, we generated word vector representations based on a corpus consisting of over 16 million PubMed abstracts. We developed a text mining pipeline to produce word2vec embeddings with different properties and performed validation experiments to assess their utility for biomedical analysis. An important pre-processing step consisted in the substitution of synonymous terms by their preferred terms in biomedical databases. Furthermore, we extracted gene-gene networks from two embedding versions and used them as prior knowledge to train Graph-Convolutional Neural Networks (CNNs) on breast cancer gene expression data to predict the occurrence of metastatic events. Performances of resulting models were compared to Graph-CNNs trained with protein-protein interaction (PPI) networks or with networks derived using other word embedding algorithms. We also assessed the effect of corpus size on the variability of word representations. Finally, we created a web service with a graphical and a RESTful interface to extract and explore relations between biomedical terms using annotated embeddings. Comparisons to biological databases showed that relations between entities such as known PPIs, signaling pathways and cellular functions, or narrower disease ontology groups correlated with higher cosine similarity. Graph-CNNs trained with word2vec-embedding-derived networks performed best for the metastatic event prediction task compared to other networks. Word representations as produced by text mining algorithms like word2vec, therefore capture biologically meaningful relations between entities.


Sign in / Sign up

Export Citation Format

Share Document