protein pair
Recently Published Documents


TOTAL DOCUMENTS

58
(FIVE YEARS 14)

H-INDEX

15
(FIVE YEARS 3)

2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Lihong Peng ◽  
Ruya Yuan ◽  
Ling Shen ◽  
Pengfei Gao ◽  
Liqian Zhou

Abstract Background Long noncoding RNAs (lncRNAs) have dense linkages with various biological processes. Identifying interacting lncRNA-protein pairs contributes to understand the functions and mechanisms of lncRNAs. Wet experiments are costly and time-consuming. Most computational methods failed to observe the imbalanced characterize of lncRNA-protein interaction (LPI) data. More importantly, they were measured based on a unique dataset, which produced the prediction bias. Results In this study, we develop an Ensemble framework (LPI-EnEDT) with Extra tree and Decision Tree classifiers to implement imbalanced LPI data classification. First, five LPI datasets are arranged. Second, lncRNAs and proteins are separately characterized based on Pyfeat and BioTriangle and concatenated as a vector to represent each lncRNA-protein pair. Finally, an ensemble framework with Extra tree and decision tree classifiers is developed to classify unlabeled lncRNA-protein pairs. The comparative experiments demonstrate that LPI-EnEDT outperforms four classical LPI prediction methods (LPI-BLS, LPI-CatBoost, LPI-SKF, and PLIPCOM) under cross validations on lncRNAs, proteins, and LPIs. The average AUC values on the five datasets are 0.8480, 0,7078, and 0.9066 under the three cross validations, respectively. The average AUPRs are 0.8175, 0.7265, and 0.8882, respectively. Case analyses suggest that there are underlying associations between HOTTIP and Q9Y6M1, NRON and Q15717. Conclusions Fusing diverse biological features of lncRNAs and proteins and exploiting an ensemble learning model with Extra tree and decision tree classifiers, this work focus on imbalanced LPI data classification as well as interaction information inference for a new lncRNA (or protein).


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Liqian Zhou ◽  
Zhao Wang ◽  
Xiongfei Tian ◽  
Lihong Peng

Abstract Background Long noncoding RNAs (lncRNAs) play important roles in various biological and pathological processes. Discovery of lncRNA–protein interactions (LPIs) contributes to understand the biological functions and mechanisms of lncRNAs. Although wet experiments find a few interactions between lncRNAs and proteins, experimental techniques are costly and time-consuming. Therefore, computational methods are increasingly exploited to uncover the possible associations. However, existing computational methods have several limitations. First, majority of them were measured based on one simple dataset, which may result in the prediction bias. Second, few of them are applied to identify relevant data for new lncRNAs (or proteins). Finally, they failed to utilize diverse biological information of lncRNAs and proteins. Results Under the feed-forward deep architecture based on gradient boosting decision trees (LPI-deepGBDT), this work focuses on classify unobserved LPIs. First, three human LPI datasets and two plant LPI datasets are arranged. Second, the biological features of lncRNAs and proteins are extracted by Pyfeat and BioProt, respectively. Thirdly, the features are dimensionally reduced and concatenated as a vector to represent an lncRNA–protein pair. Finally, a deep architecture composed of forward mappings and inverse mappings is developed to predict underlying linkages between lncRNAs and proteins. LPI-deepGBDT is compared with five classical LPI prediction models (LPI-BLS, LPI-CatBoost, PLIPCOM, LPI-SKF, and LPI-HNM) under three cross validations on lncRNAs, proteins, lncRNA–protein pairs, respectively. It obtains the best average AUC and AUPR values under the majority of situations, significantly outperforming other five LPI identification methods. That is, AUCs computed by LPI-deepGBDT are 0.8321, 0.6815, and 0.9073, respectively and AUPRs are 0.8095, 0.6771, and 0.8849, respectively. The results demonstrate the powerful classification ability of LPI-deepGBDT. Case study analyses show that there may be interactions between GAS5 and Q15717, RAB30-AS1 and O00425, and LINC-01572 and P35637. Conclusions Integrating ensemble learning and hierarchical distributed representations and building a multiple-layered deep architecture, this work improves LPI prediction performance as well as effectively probes interaction data for new lncRNAs/proteins.


2021 ◽  
Vol 9 (6) ◽  
pp. 1323
Author(s):  
Etai Boichis ◽  
Nadejda Sigal ◽  
Ilya Borovok ◽  
Anat A. Herskovits

Infection of mammalian cells by Listeria monocytogenes (Lm) was shown to be facilitated by its phage elements. In a search for additional phage remnants that play a role in Lm’s lifecycle, we identified a conserved locus containing two XRE regulators and a pair of genes encoding a secreted metzincin protease and a lipoprotein structurally similar to a TIMP-family metzincin inhibitor. We found that the XRE regulators act as a classic CI/Cro regulatory switch that regulates the expression of the metzincin and TIMP-like genes under intracellular growth conditions. We established that when these genes are expressed, their products alter Lm morphology and increase its sensitivity to phage mediated lysis, thereby enhancing virion release. Expression of these proteins also sensitized the bacteria to cell wall targeting compounds, implying that they modulate the cell wall structure. Our data indicate that these effects are mediated by the cleavage of the TIMP-like protein by the metzincin, and its subsequent release to the extracellular milieu. While the importance of this locus to Lm pathogenicity remains unclear, the observation that this phage-associated protein pair act upon the bacterial cell wall may hold promise in the field of antibiotic potentiation to combat antibiotic resistant bacterial pathogens.


2021 ◽  
Vol 22 (S6) ◽  
Author(s):  
Weixia Xu ◽  
Yangyun Gao ◽  
Yang Wang ◽  
Jihong Guan

Abstract Background Protein protein interactions (PPIs) are essential to most of the biological processes. The prediction of PPIs is beneficial to the understanding of protein functions and thus is helpful to pathological analysis, disease diagnosis and drug design etc. As the amount of protein data is growing fast in the post genomic era, high-throughput experimental methods are expensive and time-consuming for the prediction of PPIs. Thus, computational methods have attracted researcher’s attention in recent years. A large number of computational methods have been proposed based on different protein sequence encoders. Results Notably, the confidence score of a protein sequence pair could be regarded as a kind of measurement to PPIs. The higher the confidence score for one protein pair is, the more likely the protein pair interacts. Thus in this paper, a deep learning framework, called ordinal regression and recurrent convolutional neural network (OR-RCNN) method, is introduced to predict PPIs from the perspective of confidence score. It mainly contains two parts: the encoder part of protein sequence pair and the prediction part of PPIs by confidence score. In the first part, two recurrent convolutional neural networks (RCNNs) with shared parameters are applied to construct two protein sequence embedding vectors, which can automatically extract robust local features and sequential information from the protein pairs. Based on it, the two embedding vectors are encoded into one novel embedding vector by element-wise multiplication. By taking the ordinal information behind confidence score into consideration, ordinal regression is used to construct multiple sub-classifiers in the second part. The results of multiple sub-classifiers are aggregated to obtain the final confidence score. Following that, the existence of PPIs is determined by the confidence score. We set a threshold $$\theta$$ θ , and say the interaction exists between the protein pair if its confidence score is bigger than $$\theta$$ θ . Conclusions We applied our method to predict PPIs on data sets S. cerevisiae and Homo sapiens. Through experimental verification, our method outperforms state-of-the-art PPI prediction models.


2021 ◽  
Author(s):  
Liqian ZhouZhou ◽  
Zhao Wang ◽  
Xiongfei Tian ◽  
Lihong Peng

Abstract Background: Long noncoding RNAs (lncRNAs) play important roles in various biological and pathological processes. Discovery of lncRNA-protein interactions (LPIs) contributes to understand the biological functions and mechanisms of lncRNAs. Although wet experiments find a few interactions between lncRNAs and proteins, experimental techniques are costly and time-consuming. Therefore, computational methods are increasingly exploited to uncover the possible associations. However, existing computational methods have several limitations. First, majority of them were measured based on one simple dataset, which may result in the prediction bias. Second, few of them are applied to identify relevant data for new lncRNAs (or proteins). Finally, they failed to utilize diverse biological information of lncRNAs and proteins. Results: Under the feed-forward deep architecture based on Gradient Boosting Decision Trees (LPI-deepGBDT), this work focuses on classify unobserved LPIs. First, three human LPI datasets and two plant LPI datasets are arranged. Second, the biological features of lncRNAs and proteins are extracted by Pyfeat and BioProt, respectively. Thirdly, the features are dimensionally reduced and concatenated as a vector to represent an lncRNA-protein pair. Finally, a deep architecture composed of the forward mappings and inverse mappings is developed to predict underlying linkages between lncRNAs and proteins. LPI-deepGBDT is compared with four classical LPI prediction models (LPI-BLS, LPI-CatBoost, PLIPCOM, and LPI-SKF) under three cross validations on lncRNAs, proteins, lncRNA-protein pairs, respectively. It obtains the best average AUC and AUPR values on the five datasets under the three cross validations, significantly outperforming other four LPI identification methods. That is, AUCs computed by LPI-deepGBDT are 0.8321, 0.6815, and 0.9073, respectively and AUPRs are 0.8095, 0.6771, and 0.8849, respectively. The results demonstrate the powerful classification ability of LPI-deepGBDT. Case study analyses show that there may be interactions between GAS5 and Q15717, RAB30-AS1 and O00425, and LINC-01572 and P35637. Conclusions: Integrating ensemble learning and hierarchical distributed representations and building a multiple-layered deep architecture, this work improves LPI prediction performance as well as effectively probes interaction data for new lncRNAs/proteins.


2021 ◽  
Vol 12 ◽  
Author(s):  
Jessie James Limlingan Malit ◽  
Chuanhai Wu ◽  
Ling-Li Liu ◽  
Pei-Yuan Qian

Thioamidated ribosomally synthesized and post-translationally modified peptides (RiPPs) are recently characterized natural products with wide range of potent bioactivities, such as antibiotic, antiproliferative, and cytotoxic activities. These peptides are distinguished by the presence of thioamide bonds in the peptide backbone catalyzed by the YcaO-TfuA protein pair with its genes adjacent to each other. Genome mining has facilitated an in silico approach to identify biosynthesis gene clusters (BGCs) responsible for thioamidated RiPP production. In this work, publicly available genomic data was used to detect and illustrate the diversity of putative BGCs encoding for thioamidated RiPPs. AntiSMASH and RiPPER analysis identified 613 unique TfuA-related gene cluster families (GCFs) and 797 precursor peptide families, even on phyla where the presence of these clusters have not been previously described. Several additional biosynthesis genes are colocalized with the detected BGCs, suggesting an array of possible chemical modifications. This study shows that thioamidated RiPPs occupy a widely unexplored chemical landscape.


2020 ◽  
Author(s):  
Dhananjay Kimothi ◽  
Pravesh Biyani ◽  
James M. Hogan ◽  
Melissa J. Davis

Abstract Background: Protein-Protein Interactions (PPIs) are a crucial mechanism underpinning the function of the cell. Predicting the likely relationship between a pair of proteins is thus an important problem in bioinformatics, and a wide range of machine-learning based methods have been proposed for this task. Their success is heavily dependent on the construction of the feature vectors, with most using a set of physicochemical properties derived from the sequence. Few work directly with the sequence itself. Recent works on embedding sequences in a low dimensional vector space has shown the utility of this approach for tasks such as protein classification and sequence search. In this paper, we extend these ideas to the PPI prediction task, making inferences from the pair instead of the individual sequences.Methods: We propose a generic PPI prediction framework that constitutes a representation learning module for feature construction and a binary classifier. To construct the feature vector for a protein pair, we concatenate the distributed representations (embeddings) learned for the sequences of the constituent proteins. Each protein pair is represented as a 200-dimensional feature vector. To learn the embedding of a sequence, we use two established methods - Seq2Vec and BioVec, and we also introduce a novel feature construction method and call it SuperVecNW. The embeddings generated through SuperVecNW captures network information to some extent, along with the contextual information present in the sequences. Finally, we feed these feature vectors into a Random forest classifier to predict protein pair interactions.Results: To show the efficacy of our proposed approach, we evaluate its performance on human and yeast PPI datasets, benchmarking against the established methods. Furthermore, we test our approach on three well known networks: the one-core network (CD9), the multiple-core network (Ras-Raf-Mek-Erk-Elk-Srf pathway), and the cross-connection network (Wnt-related network) and demonstrate the improvement in predicting PPIs compared to the other methods.Conclusions: Naive low dimensional sequence embeddings provide better results on protein-protein interaction prediction task than most of the alternative representations based on other physiochemical properties. These methods require computationally modest effort due to their lower dimensionality. Advanced representation learning methods that enrich the sequence embeddings with meta information are expected to improve the results further.


Genes ◽  
2020 ◽  
Vol 11 (8) ◽  
pp. 852
Author(s):  
Hongli Chen ◽  
Mengwen Zhang ◽  
Mark Hochstrasser

Many species of arthropods carry maternally inherited bacterial endosymbionts that can influence host sexual reproduction to benefit the bacterium. The most well-known of such reproductive parasites is Wolbachia pipientis. Wolbachia are obligate intracellular α-proteobacteria found in nearly half of all arthropod species. This success has been attributed in part to their ability to manipulate host reproduction to favor infected females. Cytoplasmic incompatibility (CI), a phenomenon wherein Wolbachia infection renders males sterile when they mate with uninfected females, but not infected females (the rescue mating), appears to be the most common. CI provides a reproductive advantage to infected females in the presence of a threshold level of infected males. The molecular mechanisms of CI and other reproductive manipulations, such as male killing, parthenogenesis, and feminization, have remained mysterious for many decades. It had been proposed by Werren more than two decades ago that CI is caused by a Wolbachia-mediated sperm modification and that rescue is achieved by a Wolbachia-encoded rescue factor in the infected egg. In the past few years, new research has highlighted a set of syntenic Wolbachia gene pairs encoding CI-inducing factors (Cifs) as the key players for the induction of CI and its rescue. Within each Cif pair, the protein encoded by the upstream gene is denoted A and the downstream gene B. To date, two types of Cifs have been characterized based on the enzymatic activity identified in the B protein of each protein pair; one type encodes a deubiquitylase (thus named CI-inducing deubiquitylase or cid), and a second type encodes a nuclease (named CI-inducing nuclease or cin). The CidA and CinA proteins bind tightly and specifically to their respective CidB and CinB partners. In transgenic Drosophila melanogaster, the expression of either the Cid or Cin protein pair in the male germline induces CI and the expression of the cognate A protein in females is sufficient for rescue. With the identity of the Wolbachia CI induction and rescue factors now known, research in the field has turned to directed studies on the molecular mechanisms of CI, which we review here.


2020 ◽  
Vol 2 (2) ◽  
Author(s):  
Guangyu Zhou ◽  
Muhao Chen ◽  
Chelsea J T Ju ◽  
Zheng Wang ◽  
Jyun-Yu Jiang ◽  
...  

Abstract The functional impact of protein mutations is reflected on the alteration of conformation and thermodynamics of protein–protein interactions (PPIs). Quantifying the changes of two interacting proteins upon mutations is commonly carried out by computational approaches. Hence, extensive research efforts have been put to the extraction of energetic or structural features on proteins, followed by statistical learning methods to estimate the effects of mutations on PPI properties. Nonetheless, such features require extensive human labors and expert knowledge to obtain, and have limited abilities to reflect point mutations. We present an end-to-end deep learning framework, MuPIPR (Mutation Effects in Protein–protein Interaction PRediction Using Contextualized Representations), to estimate the effects of mutations on PPIs. MuPIPR incorporates a contextualized representation mechanism of amino acids to propagate the effects of a point mutation to surrounding amino acid representations, therefore amplifying the subtle change in a long protein sequence. On top of that, MuPIPR leverages a Siamese residual recurrent convolutional neural encoder to encode a wild-type protein pair and its mutation pair. Multi-layer perceptron regressors are applied to the protein pair representations to predict the quantifiable changes of PPI properties upon mutations. Experimental evaluations show that, with only sequence information, MuPIPR outperforms various state-of-the-art systems on estimating the changes of binding affinity for SKEMPI v1, and offers comparable performance on SKEMPI v2. Meanwhile, MuPIPR also demonstrates state-of-the-art performance on estimating the changes of buried surface areas. The software implementation is available at https://github.com/guangyu-zhou/MuPIPR.


Sign in / Sign up

Export Citation Format

Share Document