scholarly journals A Max-Margin Model for Predicting Residue—Base Contacts in Protein–RNA Interactions

Life ◽  
2021 ◽  
Vol 11 (11) ◽  
pp. 1135
Author(s):  
Shunya Kashiwagi ◽  
Kengo Sato ◽  
Yasubumi Sakakibara

Protein–RNA interactions (PRIs) are essential for many biological processes, so understanding aspects of the sequences and structures involved in PRIs is important for unraveling such processes. Because of the expensive and time-consuming techniques required for experimental determination of complex protein–RNA structures, various computational methods have been developed to predict PRIs. However, most of these methods focus on predicting only RNA-binding regions in proteins or only protein-binding motifs in RNA. Methods for predicting entire residue–base contacts in PRIs have not yet achieved sufficient accuracy. Furthermore, some of these methods require the identification of 3D structures or homologous sequences, which are not available for all protein and RNA sequences. Here, we propose a prediction method for predicting residue–base contacts between proteins and RNAs using only sequence information and structural information predicted from sequences. The method can be applied to any protein–RNA pair, even when rich information such as its 3D structure, is not available. In this method, residue–base contact prediction is formalized as an integer programming problem. We predict a residue–base contact map that maximizes a scoring function based on sequence-based features such as k-mers of sequences and the predicted secondary structure. The scoring function is trained using a max-margin framework from known PRIs with 3D structures. To verify our method, we conducted several computational experiments. The results suggest that our method, which is based on only sequence information, is comparable with RNA-binding residue prediction methods based on known binding data.

2015 ◽  
Author(s):  
Kengo Sato ◽  
Shunya Kashiwagi ◽  
Yasubumi Sakakibara

Motivation: Protein-RNA interactions (PRIs) are essential for many biological processes, so understanding aspects of the sequence and structure in PRIs is important for understanding those processes. Due to the expensive and time-consuming processes required for experimental determination of complex protein-RNA structures, various computational methods have been developed to predict PRIs. However, most of these methods focus on predicting only RNA-binding regions in proteins or only protein-binding motifs in RNA. Methods for predicting entire residue-base contacts in PRIs have not yet achieved sufficient accuracy. Furthermore, some of these methods require 3D structures or homologous sequences, which are not available for all protein and RNA sequences. Results: We propose a prediction method for residue-base contacts between proteins and RNAs using only sequence information and structural information predicted from only sequences. The method can be applied to any protein-RNA pair, even when rich information such as 3D structure is not available. Residue-base contact prediction is formalized as an integer programming problem. We predict a residue-base contact map that maximizes a scoring function based on sequence-based features such as k-mer of sequences and predicted secondary structure. The scoring function is trained by a max-margin framework from known PRIs with 3D structures. To verify our method, we conducted several computational experiments. The results suggest that our method, which is based on only sequence information, is comparable with RNA-binding residue prediction methods based on known binding data.


2010 ◽  
Vol 08 (01) ◽  
pp. 39-57 ◽  
Author(s):  
REZWAN AHMED ◽  
HUZEFA RANGWALA ◽  
GEORGE KARYPIS

Alpha-helical transmembrane proteins mediate many key biological processes and represent 20%–30% of all genes in many organisms. Due to the difficulties in experimentally determining their high-resolution 3D structure, computational methods to predict the location and orientation of transmembrane helix segments using sequence information are essential. We present TOPTMH, a new transmembrane helix topology prediction method that combines support vector machines, hidden Markov models, and a widely used rule-based scheme. The contribution of this work is the development of a prediction approach that first uses a binary SVM classifier to predict the helix residues and then it employs a pair of HMM models that incorporate the SVM predictions and hydropathy-based features to identify the entire transmembrane helix segments by capturing the structural characteristics of these proteins. TOPTMH outperforms state-of-the-art prediction methods and achieves the best performance on an independent static benchmark.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Xiongfei Tian ◽  
Ling Shen ◽  
Zhenwu Wang ◽  
Liqian Zhou ◽  
Lihong Peng

AbstractLong noncoding RNAs (lncRNAs) regulate many biological processes by interacting with corresponding RNA-binding proteins. The identification of lncRNA–protein Interactions (LPIs) is significantly important to well characterize the biological functions and mechanisms of lncRNAs. Existing computational methods have been effectively applied to LPI prediction. However, the majority of them were evaluated only on one LPI dataset, thereby resulting in prediction bias. More importantly, part of models did not discover possible LPIs for new lncRNAs (or proteins). In addition, the prediction performance remains limited. To solve with the above problems, in this study, we develop a Deep Forest-based LPI prediction method (LPIDF). First, five LPI datasets are obtained and the corresponding sequence information of lncRNAs and proteins are collected. Second, features of lncRNAs and proteins are constructed based on four-nucleotide composition and BioSeq2vec with encoder-decoder structure, respectively. Finally, a deep forest model with cascade forest structure is developed to find new LPIs. We compare LPIDF with four classical association prediction models based on three fivefold cross validations on lncRNAs, proteins, and LPIs. LPIDF obtains better average AUCs of 0.9012, 0.6937 and 0.9457, and the best average AUPRs of 0.9022, 0.6860, and 0.9382, respectively, for the three CVs, significantly outperforming other methods. The results show that the lncRNA FTX may interact with the protein P35637 and needs further validation.


2022 ◽  
Vol 1 ◽  
Author(s):  
Zhi-Hao Guo ◽  
Li Yuan ◽  
Ya-Lan Tan ◽  
Ben-Gong Zhang ◽  
Ya-Zhou Shi

The 3D architectures of RNAs are essential for understanding their cellular functions. While an accurate scoring function based on the statistics of known RNA structures is a key component for successful RNA structure prediction or evaluation, there are few tools or web servers that can be directly used to make comprehensive statistical analysis for RNA 3D structures. In this work, we developed RNAStat, an integrated tool for making statistics on RNA 3D structures. For given RNA structures, RNAStat automatically calculates RNA structural properties such as size and shape, and shows their distributions. Based on the RNA structure annotation from DSSR, RNAStat provides statistical information of RNA secondary structure motifs including canonical/non-canonical base pairs, stems, and various loops. In particular, the geometry of base-pairing/stacking can be calculated in RNAStat by constructing a local coordinate system for each base. In addition, RNAStat also supplies the distribution of distance between any atoms to the users to help build distance-based RNA statistical potentials. To test the usability of the tool, we established a non-redundant RNA 3D structure dataset, and based on the dataset, we made a comprehensive statistical analysis on RNA structures, which could have the guiding significance for RNA structure modeling. The python code of RNAStat, the dataset used in this work, and corresponding statistical data files are freely available at GitHub (https://github.com/RNA-folding-lab/RNAStat).


2017 ◽  
Author(s):  
Philippe Youkharibache

AbstractDuring the last decades, 3D Molecular Graphics in Life Sciences has been used almost exclusively by experts through complex software and applications ranging from Structural Biology to Computer Aided Drug Design. The emergence of JavaScript and WebGL as a viable platform has enabled 3D visualization of biomolecular structures through Web browsers, without any need for specialized software. Although still in its infancy, Web Molecular Graphics opens new perspectives. This white paper, proposes a set of Twelve Elements to consider to enable 3D visualization and structural analyses of biological systems in Web molecular viewers. The Elements go beyond 3D graphics and propose an integrated approach to visualize and analyze molecular entities and their interactions in multiple dimensions, at multiple levels of details, for diverse users. The bridging of 1D sequence browsers and 3D structure viewers, possible under a Web browser, enables information flow where molecular biologists can use structural information directly at the sequence level. Given the tsunami of sequence information linked to diseases from next generation sequencing - in need for interpretation - making structural information readily available to research scientists is a tremendous opportunity for medical discovery. The Twelve Elements are conceptual and are intended to entice developers to architect software components and APIs, and to gather together as a community around common goals and open source software. A few features of emerging viewers, all available as open source, are highlighted. Speed and quality of 3D graphics for large molecular systems, the interoperability of Web components, and the instantaneous sharing of annotated visualizations through the Web, are some of the most amazing and promising capabilities of 3D Web viewing, opening bright perspectives for Life Sciences research.


2019 ◽  
Author(s):  
Sheng Chen ◽  
Zhe Sun ◽  
Zifeng Liu ◽  
Xun Liu ◽  
Yutian Chong ◽  
...  

ABSTRACTProtein sequence profile prediction aims to generate multiple sequences from structural information to advance the protein design. Protein sequence profile can be computationally predicted by energy-based method or fragment-based methods. By integrating these methods with neural networks, our previous method, SPIN2 has achieved a sequence recovery rate of 34%. However, SPIN2 employed only one dimensional (1D) structural properties that are not sufficient to represent 3D structures. In this study, we represented 3D structures by 2D maps of pairwise residue distances. and developed a new method (SPROF) to predict protein sequence profile based on an image captioning learning frame. To our best knowledge, this is the first method to employ 2D distance map for predicting protein properties. SPROF achieved 39.8% in sequence recovery of residues on the independent test set, representing a 5.2% improvement over SPIN2. We also found the sequence recovery increased with the number of their neighbored residues in 3D structural space, indicating that our method can effectively learn long range information from the 2D distance map. Thus, such network architecture using 2D distance map is expected to be useful for other 3D structure-based applications, such as binding site prediction, protein function prediction, and protein interaction prediction.


2021 ◽  
Vol 13 (12) ◽  
pp. 2255
Author(s):  
Matteo Pardini ◽  
Victor Cazcarra-Bes ◽  
Konstantinos Papathanassiou

Synthetic Aperture Radar (SAR) measurements are unique for mapping forest 3D structure and its changes in time. Tomographic SAR (TomoSAR) configurations exploit this potential by reconstructing the 3D radar reflectivity. The frequency of the SAR measurements is one of the main parameters determining the information content of the reconstructed reflectivity in terms of penetration and sensitivity to the individual vegetation elements. This paper attempts to review and characterize the structural information content of L-band TomoSAR reflectivity reconstructions, and their potential to forest structure mapping. First, the challenges in the accurate TomoSAR reflectivity reconstruction of volume scatterers (which are expected to dominate at L-band) and to extract physical structure information from the reconstructed reflectivity is addressed. Then, the L-band penetration capability is directly evaluated by means of the estimation performance of the sub-canopy ground topography. The information content of the reconstructed reflectivity is then evaluated in terms of complementary structure indices. Finally, the dependency of the TomoSAR reconstruction and of its structural information to both the TomoSAR acquisition geometry and the temporal change of the reflectivity that may occur in the time between the TomoSAR measurements in repeat-pass or bistatic configurations is evaluated. The analysis is supported by experimental results obtained by processing airborne acquisitions performed over temperate forest sites close to the city of Traunstein in the south of Germany.


Author(s):  
Hriday K. Basak ◽  
Soumen Saha ◽  
Joydeep Ghosh ◽  
Uttam Paswan ◽  
Sujoy Karmakar ◽  
...  

Background: Treatment of the Covid-19 pandemic caused by the highly contagious and pathogenic SARS-CoV-2 is a global menace. Day by day this pandemic is getting worse. Doctors, Scientists and Researchers across the world are urgently scrambling for a cure for novel corona virus and continuously working at break neck speed to develop vaccine or drugs. But to date, there are no specific drugs or vaccine available in the market to cope up the virus. Objective: The present study helps us to elucidate 3D structures of SARS-CoV-2 proteins and also to identify best natural compounds as potential inhibitors against COVID-19. Methods: The 3D structures of the proteins were constructed using Modeller 9.16 modeling tool. Modelled proteins were validated with PROCHECK by Ramachandran plot analysis. In this study a small library of natural compounds (fifty compounds) was docked to the ACE2 binding site of the modelled surface glycoprotein of SARS-CoV-2 using Auto Dock Vina to repurpose these inhibitors for SARS-CoV-2. Conceptual density functional theory calculations of best eight compounds had been performed by Gaussian-09. Geometry optimizations for these molecules were done at M06-2X/ def2-TZVP level of theory. ADME parameters, pharmacokinetic properties and drug likeliness of the compounds were analyzed in the swissADME website. Results: In this study we analysed the sequences of surface glycoprotein, nucleocapsid phosphoprotein and envelope protein obtained from different parts of the globe. We have modelled all the different sequences of surface glycoprotein and envelop protein in order to derive 3D structure of a molecular target which is essential for the development of therapeutics. Different electronic properties of the inhibitors have been calculated using DFT through M06-2X functional with def2-TZVP basis set. Docking result at the hACE2 binding site of all modelled surface glycoproteins of SARS-CoV-2 showed that all the eight inhibitors (Actinomycin D, avellanin C, ichangin, kanglemycin A, obacunone, ursolic acid, ansamiotocin P-3 and isomitomycin A) studied here many folds better compared to hydroxychloroquine which has been found to be effective to treat patients suffering fromCOVID-19 pandemic. All the inhibitors meet most of criteria of drug likeness assessment. Conclusion: We will expect that eight compounds (Actinomycin D, avellanin C, ichangin, kanglemycin A, obacunone, ursolic acid, ansamiotocin P-3 and isomitomycin A) can be used as potential inhibitors against SARS-CoV-2.


2016 ◽  
Vol 2016 ◽  
pp. 1-9 ◽  
Author(s):  
Luping Zheng ◽  
Jinai Yao ◽  
Fangluan Gao ◽  
Lin Chen ◽  
Chao Zhang ◽  
...  

Nucleolar proteins play important roles in plant cytology, growth, and development. Fibrillarin2 is a nucleolar protein ofNicotiana benthamiana(N. benthamiana). Its cDNA was amplified by RT-PCR and inserted into expression vector pEarley101 labeled with yellow fluorescent protein (YFP). The fusion protein was localized in the nucleolus and Cajal body of leaf epidermal cells ofN. benthamiana. TheN. benthamianafibrillarin2 (NbFib2) protein has three functional domains (i.e., glycine and arginine rich domain, RNA-binding domain, andα-helical domain) and a nuclear localization signal (NLS) in C-terminal. The protein 3D structure analysis predicted that NbFib2 is anα/βprotein. In addition, the virus induced gene silencing (VIGS) approach was used to determine the function of NbFib2. Our results showed that symptoms including growth retardation, organ deformation, chlorosis, and necrosis appeared in NbFib2-silencedN. benthamiana.


Sign in / Sign up

Export Citation Format

Share Document