scholarly journals LPI-HyADBS: a hybrid framework for lncRNA-protein interaction prediction integrating feature selection and classification

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Liqian Zhou ◽  
Qi Duan ◽  
Xiongfei Tian ◽  
He Xu ◽  
Jianxin Tang ◽  
...  

Abstract Background Long noncoding RNAs (lncRNAs) have dense linkages with a plethora of important cellular activities. lncRNAs exert functions by linking with corresponding RNA-binding proteins. Since experimental techniques to detect lncRNA-protein interactions (LPIs) are laborious and time-consuming, a few computational methods have been reported for LPI prediction. However, computation-based LPI identification methods have the following limitations: (1) Most methods were evaluated on a single dataset, and researchers may thus fail to measure their generalization ability. (2) The majority of methods were validated under cross validation on lncRNA-protein pairs, did not investigate the performance under other cross validations, especially for cross validation on independent lncRNAs and independent proteins. (3) lncRNAs and proteins have abundant biological information, how to select informative features need to further investigate. Results Under a hybrid framework (LPI-HyADBS) integrating feature selection based on AdaBoost, and classification models including deep neural network (DNN), extreme gradient Boost (XGBoost), and SVM with a penalty Coefficient of misclassification (C-SVM), this work focuses on finding new LPIs. First, five datasets are arranged. Each dataset contains lncRNA sequences, protein sequences, and an LPI network. Second, biological features of lncRNAs and proteins are acquired based on Pyfeat. Third, the obtained features of lncRNAs and proteins are selected based on AdaBoost and concatenated to depict each LPI sample. Fourth, DNN, XGBoost, and C-SVM are used to classify lncRNA-protein pairs based on the concatenated features. Finally, a hybrid framework is developed to integrate the classification results from the above three classifiers. LPI-HyADBS is compared to six classical LPI prediction approaches (LPI-SKF, LPI-NRLMF, Capsule-LPI, LPI-CNNCP, LPLNP, and LPBNI) on five datasets under 5-fold cross validations on lncRNAs, proteins, lncRNA-protein pairs, and independent lncRNAs and independent proteins. The results show LPI-HyADBS has the best LPI prediction performance under four different cross validations. In particular, LPI-HyADBS obtains better classification ability than other six approaches under the constructed independent dataset. Case analyses suggest that there is relevance between ZNF667-AS1 and Q15717. Conclusions Integrating feature selection approach based on AdaBoost, three classification techniques including DNN, XGBoost, and C-SVM, this work develops a hybrid framework to identify new linkages between lncRNAs and proteins.

2019 ◽  
Vol 14 (7) ◽  
pp. 621-627 ◽  
Author(s):  
Youhuang Bai ◽  
Xiaozhuan Dai ◽  
Tiantian Ye ◽  
Peijing Zhang ◽  
Xu Yan ◽  
...  

Background: Long noncoding RNAs (lncRNAs) are endogenous noncoding RNAs, arbitrarily longer than 200 nucleotides, that play critical roles in diverse biological processes. LncRNAs exist in different genomes ranging from animals to plants. Objective: PlncRNADB is a searchable database of lncRNA sequences and annotation in plants. Methods: We built a pipeline for lncRNA prediction in plants, providing a convenient utility for users to quickly distinguish potential noncoding RNAs from protein-coding transcripts. Results: More than five thousand lncRNAs are collected from four plant species (Arabidopsis thaliana, Arabidopsis lyrata, Populus trichocarpa and Zea mays) in PlncRNADB. Moreover, our database provides the relationship between lncRNAs and various RNA-binding proteins (RBPs), which can be displayed through a user-friendly web interface. Conclusion: PlncRNADB can serve as a reference database to investigate the lncRNAs and their interaction with RNA-binding proteins in plants. The PlncRNADB is freely available at http://bis.zju.edu.cn/PlncRNADB/.


2021 ◽  
Vol 4 (1) ◽  
pp. 22
Author(s):  
Mrinmoyee Majumder ◽  
Viswanathan Palanisamy

Control of gene expression is critical in shaping the pro-and eukaryotic organisms’ genotype and phenotype. The gene expression regulatory pathways solely rely on protein–protein and protein–nucleic acid interactions, which determine the fate of the nucleic acids. RNA–protein interactions play a significant role in co- and post-transcriptional regulation to control gene expression. RNA-binding proteins (RBPs) are a diverse group of macromolecules that bind to RNA and play an essential role in RNA biology by regulating pre-mRNA processing, maturation, nuclear transport, stability, and translation. Hence, the studies aimed at investigating RNA–protein interactions are essential to advance our knowledge in gene expression patterns associated with health and disease. Here we discuss the long-established and current technologies that are widely used to study RNA–protein interactions in vivo. We also present the advantages and disadvantages of each method discussed in the review.


2021 ◽  
Vol 7 (1) ◽  
pp. 11 ◽  
Author(s):  
André P. Gerber

RNA–protein interactions frame post-transcriptional regulatory networks and modulate transcription and epigenetics. While the technological advances in RNA sequencing have significantly expanded the repertoire of RNAs, recently developed biochemical approaches combined with sensitive mass-spectrometry have revealed hundreds of previously unrecognized and potentially novel RNA-binding proteins. Nevertheless, a major challenge remains to understand how the thousands of RNA molecules and their interacting proteins assemble and control the fate of each individual RNA in a cell. Here, I review recent methodological advances to approach this problem through systematic identification of proteins that interact with particular RNAs in living cells. Thereby, a specific focus is given to in vivo approaches that involve crosslinking of RNA–protein interactions through ultraviolet irradiation or treatment of cells with chemicals, followed by capture of the RNA under study with antisense-oligonucleotides and identification of bound proteins with mass-spectrometry. Several recent studies defining interactomes of long non-coding RNAs, viral RNAs, as well as mRNAs are highlighted, and short reference is given to recent in-cell protein labeling techniques. These recent experimental improvements could open the door for broader applications and to study the remodeling of RNA–protein complexes upon different environmental cues and in disease.


RNA ◽  
2021 ◽  
pp. rna.078896.121
Author(s):  
Yan Han ◽  
Xuzhen Guo ◽  
Tiancai Zhang ◽  
Jiangyun Wang ◽  
Keqiong Ye

Characterization of RNA-protein interaction is fundamental for understanding metabolism and function of RNA. UV crosslinking has been widely used to map the targets of RNA-binding proteins, but is limited by low efficiency, requirement for zero-distance contact and biases for single-stranded RNA structure and certain residues of RNA and protein. Here, we report the development of an RNA-protein crosslinker (AMT-NHS) composed of a psoralen derivative and an N-hydroxysuccinimide ester group, which react with RNA bases and primary amines of protein, respectively. We show that AMT-NHS can penetrate into living yeast cells and crosslink Cbf5 to H/ACA snoRNAs with high specificity. The crosslinker induced different crosslinking patterns than UV and targeted both single- and double-stranded regions of RNA. The crosslinker provides a new tool to capture diverse RNA-protein interactions in cells.


2020 ◽  
Author(s):  
Santana Royan ◽  
Bernard Gutmann ◽  
Catherine Colas des Francs-Small ◽  
Suvi Honkanen ◽  
Jason Schmidberger ◽  
...  

Abstract Targeted cytidine to uridine RNA editing is a widespread phenomenon throughout the land plant lineage. Members of the pentatricopeptide repeat (PPR) protein family act as the specificity factors in this process. These proteins consist of helix-turn-helix domains, each of which recognises a single RNA nucleotide following a well-elucidated code. A cytidine deaminase-like domain (present at the C-terminus of some PPR editing factors or provided in trans via protein-protein interactions) is the catalytic domain in the process. The huge expansion of the PPR superfamily in land plants provides the sequence variation required for design of novel consensus-based RNA-binding proteins. We used this approach to construct a synthetic RNA editing factor designed to target one of the two sites in the Arabidopsis chloroplast transcriptome naturally recognised by the RNA editing factor CHLOROPLAST BIOGENESIS 19 (CLB19). We show that this designed editing factor specifically recognises the target sequence in in vitro binding assays and can partially complement a clb19 mutant. The designed factor is specific for the target rpoA site and does not recognise or edit the other site recognised by CLB19 in the clpP1 transcript. We show that the designed editing factor can function equally specifically in the bacterium E. coli, and shows some activity even in the absence of the editing cofactors that are often required for natural editing factor activity in plants. This study serves as a successful pilot into the design and application of programmable RNA editing factors based on plant PPR proteins.


Open Biology ◽  
2019 ◽  
Vol 9 (6) ◽  
pp. 190096 ◽  
Author(s):  
Anna Balcerak ◽  
Alicja Trebinska-Stryjewska ◽  
Ryszard Konopinski ◽  
Maciej Wakula ◽  
Ewa Anna Grzybowska

RNA–protein interactions are crucial for most biological processes in all organisms. However, it appears that the complexity of RNA-based regulation increases with the complexity of the organism, creating additional regulatory circuits, the scope of which is only now being revealed. It is becoming apparent that previously unappreciated features, such as disordered structural regions in proteins or non-coding regions in DNA leading to higher plasticity and pliability in RNA–protein complexes, are in fact essential for complex, precise and fine-tuned regulation. This review addresses the issue of the role of RNA–protein interactions in generating eukaryotic complexity, focusing on the newly characterized disordered RNA-binding motifs, moonlighting of metabolic enzymes, RNA-binding proteins interactions with different RNA species and their participation in regulatory networks of higher order.


2012 ◽  
Vol 3 (5) ◽  
pp. 403-414 ◽  
Author(s):  
Jochen Imig ◽  
Alexander Kanitz ◽  
André P. Gerber

AbstractThe development of genome-wide analysis tools has prompted global investigation of the gene expression program, revealing highly coordinated control mechanisms that ensure proper spatiotemporal activity of a cell’s macromolecular components. With respect to the regulation of RNA transcripts, the concept of RNA regulons, which – by analogy with DNA regulons in bacteria – refers to the coordinated control of functionally related RNA molecules, has emerged as a unifying theory that describes the logic of regulatory RNA-protein interactions in eukaryotes. Hundreds of RNA-binding proteins and small non-coding RNAs, such as microRNAs, bind to distinct elements in target RNAs, thereby exerting specific and concerted control over posttranscriptional events. In this review, we discuss recent reports committed to systematically explore the RNA-protein interaction network and outline some of the principles and recurring features of RNA regulons: the coordination of functionally related mRNAs through RNA-binding proteins or non-coding RNAs, the modular structure of its components, and the dynamic rewiring of RNA-protein interactions upon exposure to internal or external stimuli. We also summarize evidence for robust combinatorial control of mRNAs, which could determine the ultimate fate of each mRNA molecule in a cell. Finally, the compilation and integration of global protein-RNA interaction data has yielded first insights into network structures and provided the hypothesis that RNA regulons may, in part, constitute noise ‘buffers’ to handle stochasticity in cellular transcription.


2016 ◽  
Vol 14 (03) ◽  
pp. 1650011 ◽  
Author(s):  
Wajid Arshad Abbasi ◽  
Fayyaz Ul Amir Afsar Minhas

The study of interactions between host and pathogen proteins is important for understanding the underlying mechanisms of infectious diseases and for developing novel therapeutic solutions. Wet-lab techniques for detecting protein–protein interactions (PPIs) can benefit from computational predictions. Machine learning is one of the computational approaches that can assist biologists by predicting promising PPIs. A number of machine learning based methods for predicting host–pathogen interactions (HPI) have been proposed in the literature. The techniques used for assessing the accuracy of such predictors are of critical importance in this domain. In this paper, we question the effectiveness of K-fold cross-validation for estimating the generalization ability of HPI prediction for proteins with no known interactions. K-fold cross-validation does not model this scenario, and we demonstrate a sizable difference between its performance and the performance of an alternative evaluation scheme called leave one pathogen protein out (LOPO) cross-validation. LOPO is more effective in modeling the real world use of HPI predictors, specifically for cases in which no information about the interacting partners of a pathogen protein is available during training. We also point out that currently used metrics such as areas under the precision-recall or receiver operating characteristic curves are not intuitive to biologists and propose simpler and more directly interpretable metrics for this purpose.


2013 ◽  
Vol 18 (9) ◽  
pp. 967-983 ◽  
Author(s):  
Maurizio Romano ◽  
Emanuele Buratti

Dysfunctions at the level of RNA processing have recently been shown to play a fundamental role in the pathogenesis of many neurodegenerative diseases. Several proteins responsible for these dysfunctions (TDP-43, FUS/TLS, and hnRNP A/Bs) belong to the nuclear class of heterogeneous ribonucleoproteins (hnRNPs) that predominantly function as general regulators of both coding and noncoding RNA metabolism. The discovery of the importance of these factors in mediating neuronal death has represented a major paradigmatic shift in our understanding of neurodegenerative processes. As a result, these discoveries have also opened the way toward novel biomolecular screening approaches in our search for therapeutic options. One of the major hurdles in this search is represented by the correct identification of the most promising targets to be prioritized. These may include aberrant aggregation processes, protein-protein interactions, RNA-protein interactions, or specific cellular pathways altered by disease. In this review, we discuss these four major options together with their various advantages and drawbacks.


2016 ◽  
Author(s):  
Xiaotong Yao ◽  
Shuvadeep Maity ◽  
Shashank Gandhi ◽  
Marcin Imielenski ◽  
Christine Vogel

AbstractPost-translational modifications by the Small Ubiquitin-like Modifier (SUMO) are essential for diverse cellular functions. Large-scale experiment and sequence-based predictions have identified thousands of SUMOylated proteins. However, the overlap between the datasets is small, suggesting many false positives with low functional relevance. Therefore, we integrated ~800 sequence features and protein characteristics such as cellular function and protein-protein interactions in a machine learning approach to score likely functional SUMOylation events (iSUMO). iSUMO is trained on a total of 24 large-scale datasets, and it predicts 2,291 and 706 SUMO targets in human and yeast, respectively. These estimates are five times higher than what existing sequence-based tools predict at the same 5% false positive rate. Protein-protein and protein-nucleic acid interactions are highly predictive of protein SUMOylation, supporting a role of the modification in protein complex formation. We note the marked prevalence of SUMOylation amongst RNA-binding proteins. We validate iSUMO predictions by experimental or other evidence. iSUMO therefore represents a comprehensive tool to identify high-confidence, functional SUMOylation events for human and yeast.


Sign in / Sign up

Export Citation Format

Share Document