protein interaction data
Recently Published Documents


TOTAL DOCUMENTS

244
(FIVE YEARS 64)

H-INDEX

27
(FIVE YEARS 2)

Cells ◽  
2022 ◽  
Vol 11 (2) ◽  
pp. 198
Author(s):  
Federico Ferraro ◽  
Christina Fevga ◽  
Vincenzo Bonifati ◽  
Wim Mandemakers ◽  
Ahmed Mahfouz ◽  
...  

Several studies have analyzed gene expression profiles in the substantia nigra to better understand the pathological mechanisms causing Parkinson’s disease (PD). However, the concordance between the identified gene signatures in these individual studies was generally low. This might have been caused by a change in cell type composition as loss of dopaminergic neurons in the substantia nigra pars compacta is a hallmark of PD. Through an extensive meta-analysis of nine previously published microarray studies, we demonstrated that a big proportion of the detected differentially expressed genes was indeed caused by cyto-architectural alterations due to the heterogeneity in the neurodegenerative stage and/or technical artefacts. After correcting for cell composition, we identified a common signature that deregulated the previously unreported ammonium transport, as well as known biological processes such as bioenergetic pathways, response to proteotoxic stress, and immune response. By integrating with protein interaction data, we shortlisted a set of key genes, such as LRRK2, PINK1, PRKN, and FBXO7, known to be related to PD, others with compelling evidence for their role in neurodegeneration, such as GSK3β, WWOX, and VPC, and novel potential players in the PD pathogenesis. Together, these data show the importance of accounting for cyto-architecture in these analyses and highlight the contribution of multiple cell types and novel processes to PD pathology, providing potential new targets for drug development.


Biomolecules ◽  
2021 ◽  
Vol 12 (1) ◽  
pp. 37
Author(s):  
Suma L. Sivan ◽  
Vinod Chandra S. Sukumara Pillai

Network biology has become a key tool in unravelling the mechanisms of complex diseases. Detecting dys-regulated subnetworks from molecular networks is a task that needs efficient computational methods. In this work, we constructed an integrated network using gene interaction data as well as protein–protein interaction data of differentially expressed genes derived from the microarray gene expression data. We considered the level of differential expression as well as the topological weight of proteins in interaction network to quantify dys-regulation. Then, a nature-inspired Smell Detection Agent (SDA) optimisation algorithm is designed with multiple agents traversing through various paths in the network. Finally, the algorithm provides a maximum weighted module as the optimum dys-regulated subnetwork. The analysis is performed for samples of triple-negative breast cancer as well as colorectal cancer. Biological significance analysis of module genes is also done to validate the results. The breast cancer subnetwork is found to contain i) valid biomarkers including PIK3CA, PTEN, BRCA1, AR and EGFR; ii) validated drug targets TOP2A, CDK4, HDAC1, IL6, BRCA1, HSP90AA1 and AR; iii) synergistic drug targets EGFR and BIRC5. Moreover, based on the weight values assigned to nodes in the subnetwork, PLK1, CTNNB1, IGF1, AURKA, PCNA, HSPA4 and GAPDH are proposed as drug targets for further studies. For colorectal cancer module, the analysis revealed the occurrence of approved drug targets TYMS, TOP1, BRAF and EGFR. Considering the higher weight values, HSP90AA1, CCNB1, AKT1 and CXCL8 are proposed as drug targets for experimentation. The derived subnetworks possess cancer-related pathways as well. The SDA-derived breast cancer subnetwork is compared with that of tools such as MCODE and Minimum Spanning Tree, and observed a higher enrichment (75%) of significant elements. Thus, the proposed nature-inspired algorithm is a novel approach to derive the optimum dys-regulated subnetwork from huge molecular network.


2021 ◽  
Author(s):  
Georgia Tsagkogeorga ◽  
Helena Santos Rosa ◽  
Andrej Alendar ◽  
Dan Leggate ◽  
Oliver Rausch ◽  
...  

RNA methylation plays an important role in functional regulation of RNAs, and has thus attracted an increasing interest in biology and drug discovery. Here, we collected and collated transcriptomic, proteomic, structural and physical interaction data from the Harmonizome database, and applied supervised machine learning to predict novel genes associated with RNA methylation pathways in human. We selected five types of classifiers, which we trained and evaluated using cross-validation on multiple training sets. The best models reached 88% accuracy based on cross-validation, and an average 91% accuracy on the test set. Using protein-protein interaction data, we propose six molecular sub-networks linking model predictions to previously known RNA methylation genes, with roles in mRNA methylation, tRNA processing, rRNA processing, but also protein and chromatin modifications. Our study exemplifies how access to large omics datasets joined by machine learning methods can be used to predict gene function.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Junhua Ye ◽  
Shunfang Wang ◽  
Xin Yang ◽  
Xianjun Tang

Abstract Background At present, the bioinformatics research on the relationship between aging-related diseases and genes is mainly through the establishment of a machine learning multi-label model to classify each gene. Most of the existing methods for predicting pathogenic genes mainly rely on specific types of gene features, or directly encode multiple features with different dimensions, use the same encoder to concatenate and predict the final results, which will be subject to many limitations in the applicability of the algorithm. Possible shortcomings of the above include: incomplete coverage of gene features by a single type of biomics data, overfitting of small dimensional datasets by a single encoder, or underfitting of larger dimensional datasets. Methods We use the known gene disease association data and gene descriptors, such as gene ontology terms (GO), protein interaction data (PPI), PathDIP, Kyoto Encyclopedia of genes and genomes Genes (KEGG), etc, as input for deep learning to predict the association between genes and diseases. Our innovation is to use Mashup algorithm to reduce the dimensionality of PPI, GO and other large biological networks, and add new pathway data in KEGG database, and then combine a variety of biological information sources through modular Deep Neural Network (DNN) to predict the genes related to aging diseases. Result and conclusion The results show that our algorithm is more effective than the standard neural network algorithm (the Area Under the ROC curve from 0.8795 to 0.9153), gradient enhanced tree classifier and logistic regression classifier. In this paper, we firstly use DNN to learn the similar genes associated with the known diseases from the complex multi-dimensional feature space, and then provide the evidence that the assumed genes are associated with a certain disease.


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Lihong Peng ◽  
Ruya Yuan ◽  
Ling Shen ◽  
Pengfei Gao ◽  
Liqian Zhou

Abstract Background Long noncoding RNAs (lncRNAs) have dense linkages with various biological processes. Identifying interacting lncRNA-protein pairs contributes to understand the functions and mechanisms of lncRNAs. Wet experiments are costly and time-consuming. Most computational methods failed to observe the imbalanced characterize of lncRNA-protein interaction (LPI) data. More importantly, they were measured based on a unique dataset, which produced the prediction bias. Results In this study, we develop an Ensemble framework (LPI-EnEDT) with Extra tree and Decision Tree classifiers to implement imbalanced LPI data classification. First, five LPI datasets are arranged. Second, lncRNAs and proteins are separately characterized based on Pyfeat and BioTriangle and concatenated as a vector to represent each lncRNA-protein pair. Finally, an ensemble framework with Extra tree and decision tree classifiers is developed to classify unlabeled lncRNA-protein pairs. The comparative experiments demonstrate that LPI-EnEDT outperforms four classical LPI prediction methods (LPI-BLS, LPI-CatBoost, LPI-SKF, and PLIPCOM) under cross validations on lncRNAs, proteins, and LPIs. The average AUC values on the five datasets are 0.8480, 0,7078, and 0.9066 under the three cross validations, respectively. The average AUPRs are 0.8175, 0.7265, and 0.8882, respectively. Case analyses suggest that there are underlying associations between HOTTIP and Q9Y6M1, NRON and Q15717. Conclusions Fusing diverse biological features of lncRNAs and proteins and exploiting an ensemble learning model with Extra tree and decision tree classifiers, this work focus on imbalanced LPI data classification as well as interaction information inference for a new lncRNA (or protein).


Cancers ◽  
2021 ◽  
Vol 13 (16) ◽  
pp. 4207
Author(s):  
Francesco Monticolo ◽  
Maria Luisa Chiusano

It is today widely accepted that a healthy diet is very useful to prevent the risk for cancer or its deleterious effects. Nutrigenomics studies are therefore taking place with the aim to test the effects of nutrients at molecular level and contribute to the search for anti-cancer treatments. These efforts are expanding the precious source of information necessary for the selection of natural compounds useful for the design of novel drugs or functional foods. Here we present a computational study to select new candidate compounds that could play a role in cancer prevention and care. Starting from a dataset of genes that are co-expressed in programmed cell death experiments, we investigated on nutrigenomics treatments inducing apoptosis, and searched for compounds that determine the same expression pattern. Subsequently, we selected cancer types where the genes showed an opposite expression pattern and we confirmed that the apoptotic/nutrigenomics expression trend had a significant positive survival in cancer-affected patients. Furthermore, we considered the functional interactors of the genes as defined by public protein-protein interaction data, and inferred on their involvement in cancers and/or in programmed cell death. We identified 7 genes and, from available nutrigenomics experiments, 6 compounds effective on their expression. These 6 compounds were exploited to identify, by ligand-based virtual screening, additional molecules with similar structure. We checked for ADME criteria and selected 23 natural compounds representing suitable candidates for further testing their efficacy in apoptosis induction. Due to their presence in natural resources, novel drugs and/or the design of functional foods are conceivable from the presented results.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Yang Li ◽  
Zheng Wang ◽  
Li-Ping Li ◽  
Zhu-Hong You ◽  
Wen-Zhun Huang ◽  
...  

AbstractVarious biochemical functions of organisms are performed by protein–protein interactions (PPIs). Therefore, recognition of protein–protein interactions is very important for understanding most life activities, such as DNA replication and transcription, protein synthesis and secretion, signal transduction and metabolism. Although high-throughput technology makes it possible to generate large-scale PPIs data, it requires expensive cost of both time and labor, and leave a risk of high false positive rate. In order to formulate a more ingenious solution, biology community is looking for computational methods to quickly and efficiently discover massive protein interaction data. In this paper, we propose a computational method for predicting PPIs based on a fresh idea of combining orthogonal locality preserving projections (OLPP) and rotation forest (RoF) models, using protein sequence information. Specifically, the protein sequence is first converted into position-specific scoring matrices (PSSMs) containing protein evolutionary information by using the Position-Specific Iterated Basic Local Alignment Search Tool (PSI-BLAST). Then we characterize a protein as a fixed length feature vector by applying OLPP to PSSMs. Finally, we train an RoF classifier for the purpose of identifying non-interacting and interacting protein pairs. The proposed method yielded a significantly better results than existing methods, with 90.07% and 96.09% prediction accuracy on Yeast and Human datasets. Our experiment show the proposed method can serve as a useful tool to accelerate the process of solving key problems in proteomics.


2021 ◽  
Vol 7 (3) ◽  
pp. 48
Author(s):  
Arundhati Das ◽  
Tanvi Sinha ◽  
Sharmishtha Shyamal ◽  
Amaresh Chandra Panda

Circular RNAs (circRNAs) are emerging as novel regulators of gene expression in various biological processes. CircRNAs regulate gene expression by interacting with cellular regulators such as microRNAs and RNA binding proteins (RBPs) to regulate downstream gene expression. The accumulation of high-throughput RNA–protein interaction data revealed the interaction of RBPs with the coding and noncoding RNAs, including recently discovered circRNAs. RBPs are a large family of proteins known to play a critical role in gene expression by modulating RNA splicing, nuclear export, mRNA stability, localization, and translation. However, the interaction of RBPs with circRNAs and their implications on circRNA biogenesis and function has been emerging in the last few years. Recent studies suggest that circRNA interaction with target proteins modulates the interaction of the protein with downstream target mRNAs or proteins. This review outlines the emerging mechanisms of circRNA–protein interactions and their functional role in cell physiology.


2021 ◽  
Author(s):  
Mohsen Gholizade ◽  
Seyed Mehdi Esmaeili-Fard

Abstract Background Litter size and ovulation rate are important reproduction traits in sheep and have important impacts on the profitability of farm animals. To investigate the genetic architecture of litter size, we report the first meta-analysis of genome-wide association studies (GWAS) using 522 ewes and 564,377 SNPs from six sheep breeds. Results We identified 29 significant associations for litter size which 27 of which have been not reported in GWASs for each population. However, we could confirm the role of BMPR1B in prolificacy. Our gene set analysis discovered biological pathways related to cell signaling, communication, and adhesion. Functional clustering and enrichment using protein databases identified epidermal growth factor-like domain affecting litter size. Through analyzing protein-protein interaction data, we could identify hub genes like CASK, PLCB4, RPTOR, GRIA2, and PLCB1 that were enriched in most of the significant pathways. These genes have a role in cell proliferation, cell adhesion, cell growth and survival, and autophagy. Conclusions Notably, identified SNPs were scattered on several different chromosomes implying different genetic mechanisms underlying variation of prolificacy in each breed. Given the different layers that make up the follicles and the need for communication and transfer of hormones and nutrients through these layers to the oocyte, the significance of pathways related to cell signaling and communication seems logical. Our results provide genetic insights into the litter size variation in different sheep breeds.


2021 ◽  
Author(s):  
Nazar Zaki ◽  
Harsh Singh

Protein complexes are groups of two or more polypeptide chains that join together to build noncovalent networks of protein interactions. A number of means of computing the ways in which protein complexes and their members can be identified from these interaction networks have been created. While most of the existing methods identify protein complexes from the protein-protein interaction networks (PPIs) at a fairly decent level, the applicability of advanced graph network methods has not yet been adequately investigated. In this paper, we proposed various graph convolutional networks (GCNs) methods to improve the detection of the protein functional complexes. We first formulated the protein complex detection problem as a node classification problem. Second, the Neural Overlapping Community Detection (NOCD) model was applied to cluster the nodes (proteins) using a complex affiliation matrix. A representation learning approach, which combines the multi-class GCN feature extractor (to obtain the features of the nodes) and the mean shift clustering algorithm (to perform clustering), is also presented. We have also improved the efficiency of the multi-class GCN network to reduce space and time complexities by converting the dense-dense matrix operations into dense-spares or sparse-sparse matrix operations. This proposed solution significantly improves the scalability of the existing GCN network. Finally, we apply clustering aggregation to find the best protein complexes. A grid search was performed on various detected complexes obtained by applying three well-known protein detection methods namely ClusterONE, CMC, and PEWCC with the help of the Meta-Clustering Algorithm (MCLA) and Hybrid Bipartite Graph Formulation (HBGF) algorithm. The proposed GCN-based methods were tested on various publicly available datasets and provided significantly better performance than the previous state-of-the-art methods. The code and data used in this study are available from https://github.com/Analystharsh/GCN_complex_detection.


Sign in / Sign up

Export Citation Format

Share Document