Protein-ensemble–RNA docking by efficient consideration of protein flexibility through homology models

2019 ◽  
Vol 35 (23) ◽  
pp. 4994-5002 ◽  
Author(s):  
Jiahua He ◽  
Huanyu Tao ◽  
Sheng-You Huang

AbstractMotivationGiven the importance of protein–ribonucleic acid (RNA) interactions in many biological processes, a variety of docking algorithms have been developed to predict the complex structure from individual protein and RNA partners in the past decade. However, due to the impact of molecular flexibility, the performance of current methods has hit a bottleneck in realistic unbound docking. Pushing the limit, we have proposed a protein-ensemble–RNA docking strategy to explicitly consider the protein flexibility in protein–RNA docking through an ensemble of multiple protein structures, which is referred to as MPRDock. Instead of taking conformations from MD simulations or experimental structures, we obtained the multiple structures of a protein by building models from its homologous templates in the Protein Data Bank (PDB).ResultsOur approach can not only avoid the reliability issue of structures from MD simulations but also circumvent the limited number of experimental structures for a target protein in the PDB. Tested on 68 unbound–bound and 18 unbound–unbound protein–RNA complexes, our MPRDock/DITScorePR considerably improved the docking performance and achieved a significantly higher success rate than single-protein rigid docking whether pseudo-unbound templates are included or not. Similar improvements were also observed when combining our ensemble docking strategy with other scoring functions. The present homology model-based ensemble docking approach will have a general application in molecular docking for other interactions.Availability and implementationhttp://huanglab.phys.hust.edu.cn/mprdock/Supplementary informationSupplementary data are available at Bioinformatics online.

2019 ◽  
Vol 35 (20) ◽  
pp. 3989-3995 ◽  
Author(s):  
Hongjian Li ◽  
Jiangjun Peng ◽  
Pavel Sidorov ◽  
Yee Leung ◽  
Kwong-Sak Leung ◽  
...  

Abstract Motivation Studies have shown that the accuracy of random forest (RF)-based scoring functions (SFs), such as RF-Score-v3, increases with more training samples, whereas that of classical SFs, such as X-Score, does not. Nevertheless, the impact of the similarity between training and test samples on this matter has not been studied in a systematic manner. It is therefore unclear how these SFs would perform when only trained on protein-ligand complexes that are highly dissimilar or highly similar to the test set. It is also unclear whether SFs based on machine learning algorithms other than RF can also improve accuracy with increasing training set size and to what extent they learn from dissimilar or similar training complexes. Results We present a systematic study to investigate how the accuracy of classical and machine-learning SFs varies with protein-ligand complex similarities between training and test sets. We considered three types of similarity metrics, based on the comparison of either protein structures, protein sequences or ligand structures. Regardless of the similarity metric, we found that incorporating a larger proportion of similar complexes to the training set did not make classical SFs more accurate. In contrast, RF-Score-v3 was able to outperform X-Score even when trained on just 32% of the most dissimilar complexes, showing that its superior performance owes considerably to learning from dissimilar training complexes to those in the test set. In addition, we generated the first SF employing Extreme Gradient Boosting (XGBoost), XGB-Score, and observed that it also improves with training set size while outperforming the rest of SFs. Given the continuous growth of training datasets, the development of machine-learning SFs has become very appealing. Availability and implementation https://github.com/HongjianLi/MLSF Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Sarah Hall-Swan ◽  
Dinler A. Antunes ◽  
Didier Devaurs ◽  
Mauricio M. Rigo ◽  
Lydia E. Kavraki ◽  
...  

AbstractMotivationRecent efforts to computationally identify inhibitors for SARS-CoV-2 proteins have largely ignored the issue of receptor flexibility. We have implemented a computational tool for ensemble docking with the SARS-CoV-2 proteins, including the main protease (Mpro), papain-like protease (PLpro) and RNA-dependent RNA polymerase (RdRp).ResultsEnsembles of other SARS-CoV-2 proteins are being prepared and made available through a user-friendly docking interface. Plausible binding modes between conformations of a selected ensemble and an uploaded ligand are generated by DINC, our parallelized meta-docking tool. Binding modes are scored with three scoring functions, and account for the flexibility of both the ligand and receptor. Additional details on our methods are provided in the supplementary material.Availabilitydinc-covid.kavrakilab.orgSupplementary informationDetails on methods for ensemble generation and docking are provided as supplementary data [email protected], [email protected]


2019 ◽  
Vol 36 (7) ◽  
pp. 2284-2285 ◽  
Author(s):  
Miguel Romero-Durana ◽  
Brian Jiménez-García ◽  
Juan Fernández-Recio

Abstract Motivation Protein–protein interactions are key to understand biological processes at the molecular level. As a complement to experimental characterization of protein interactions, computational docking methods have become useful tools for the structural and energetics modeling of protein–protein complexes. A key aspect of such algorithms is the use of scoring functions to evaluate the generated docking poses and try to identify the best models. When the scoring functions are based on energetic considerations, they can help not only to provide a reliable structural model for the complex, but also to describe energetic aspects of the interaction. This is the case of the scoring function used in pyDock, a combination of electrostatics, desolvation and van der Waals energy terms. Its correlation with experimental binding affinity values of protein–protein complexes was explored in the past, but the per-residue decomposition of the docking energy was never systematically analyzed. Results Here, we present pyDockEneRes (pyDock Energy per-Residue), a web server that provides pyDock docking energy partitioned at the residue level, giving a much more detailed description of the docking energy landscape. Additionally, pyDockEneRes computes the contribution to the docking energy of the side-chain atoms. This fast approach can be applied to characterize a complex structure in order to identify energetically relevant residues (hot-spots) and estimate binding affinity changes upon mutation to alanine. Availability and implementation The server does not require registration by the user and is freely accessible for academics at https://life.bsc.es/pid/pydockeneres. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 8 ◽  
Author(s):  
Lorenza Pacini ◽  
Rodrigo Dorantes-Gilardi ◽  
Laurent Vuillon ◽  
Claire Lesieur

Proteins fulfill complex and diverse biological functions through the controlled atomic motions of their structures (functional dynamics). The protein composition is given by its amino-acid sequence, which was assumed to encode the function. However, the discovery of functional sequence variants proved that the functional encoding does not come down to the sequence, otherwise a change in the sequence would mean a change of function. Likewise, the discovery that function is fulfilled by a set of structures and not by a unique structure showed that the functional encoding does not come down to the structure either. That leaves us with the possibility that a set of atomic motions, achievable by different sequences and different structures, encodes a specific function. Thanks to the exponential growth in annual depositions in the Protein Data Bank of protein tridimensional structures at atomic resolutions, network models using the Cartesian coordinates of atoms of a protein structure as input have been used over 20 years to investigate protein features. Combining networks with experimental measures or with Molecular Dynamics (MD) simulations and using typical or ad-hoc network measures is well suited to decipher the link between protein dynamics and function. One perspective is to consider static structures alone as alternatives to address the question and find network measures relevant to dynamics that can be subsequently used for mining and classification of dynamic sequence changes functionally robust, adaptable or faulty. This way the set of dynamics that fulfill a function over a diversity of sequences and structures will be determined.


Author(s):  
Aeri Lee ◽  
Dongsup Kim

Abstract Motivation Identification of putative drug targets is a critical step for explaining the mechanism of drug action against multiple targets, finding new therapeutic indications for existing drugs and unveiling the adverse drug reactions. One important approach is to use the molecular docking. However, its widespread utilization has been hindered by the lack of easy-to-use public servers. Therefore, it is vital to develop a streamlined computational tool for target prediction by molecular docking on a large scale. Results We present a fully automated web tool named Consensus Reverse Docking System (CRDS), which predicts potential interaction sites for a given drug. To improve hit rates, we developed a strategy of consensus scoring. CRDS carries out reverse docking against 5254 candidate protein structures using three different scoring functions (GoldScore, Vina and LeDock from GOLD version 5.7.1, AutoDock Vina version 1.1.2 and LeDock version 1.0, respectively), and those scores are combined into a single score named Consensus Docking Score (CDS). The web server provides the list of top 50 predicted interaction sites, docking conformations, 10 most significant pathways and the distribution of consensus scores. Availability and implementation The web server is available at http://pbil.kaist.ac.kr/CRDS. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Yu Lei ◽  
Sheng Guo ◽  
Yi Liu ◽  
Zhili Zuo

Abstract MotivationChallenges remained in structure-based drug discovery which include protein flexibility in binding site. Thus, concerning the flexibility of proteins, docking into an ensemble of rigid conformations (ensemble docking) have been proposed with incorporation into protein flexibility with expects that it could provide higher enrichments than rigid single receptor. Here we have developed the ensemble docking strategy by using Bayesian Model algorithms, and this method is validated by three proteins: BTK, JAK and PARP. The Bayesian Model was used to integrate independent docking runs of an ensemble of rigid crystal structures and MD simulations. ResultsThe structure of MD simulations outperforms the crystal structures in separating inhibitors from decoys in BTK and PARP. Further, the results demonstrated that the ensemble docking strategy has better performance than rigid single conformation.


2020 ◽  
Author(s):  
Louison Fresnais ◽  
Pedro J. Ballester

AbstractLarger training datasets have been shown to improve the accuracy of Machine Learning (ML)-based Scoring functions (SFs) for Structure-Based Virtual Screening (SBVS). In addition, massive test sets for SBVS, known as ultra-large compound libraries, have been demonstrated to enable the fast discovery of selective drug leads with at least nanomolar potency. This proof-of-concept was carried out on two targets using a single docking tool along with its SF. It is thus unclear whether this high level of performance would generalise to other targets, docking tools and SFs.We found that screening a larger compound library results in more potent actives being identified in all six additional targets using a different docking tool along with its classical SF. Furthermore, we established that a way to improve the potency of the retrieved molecules further is to rank them with more accurate ML-based SFs (we found this to be true in four of the six targets, the difference was not significant in the remaining two targets). A three-fold increase in average hit rate across targets was also achieved by the ML-based SFs. Lastly, we observed that classical and ML-based SFs often find different actives, which supports using both types of SFs on those [email protected] informationan online-only supplementary results file is enclosed.Biographical NoteL. Fresnais carried out a master research project directly supervised by P.J Ballester and he will soon be starting a PhD.P.J Ballester has been working on virtual screening for over 15 years now. He is group leader and research scientist at cancer research centre of INSERM, the French National Institute of Health & Medical Research.


Author(s):  
Hiroko X. Kondo ◽  
Yu Takano

Heme is located in the active site of proteins and has diverse and important biological functions, such as electron transfer and oxygen transport and/or storage. The distortion of heme porphyrin is considered an important factor for the diverse functions of heme because it correlates with the physical properties of heme, such as oxygen affinity and redox potential. Therefore, clarification of the relationship between heme distortion and the protein environment is crucial in protein science. Here, we analyzed the fluctuation in heme distortion in the protein environment for hemoglobin and myoglobin using molecular dynamics (MD) simulations and quantum mechanical (QM) calculations. We also investigated the protein structures of hemoglobin and myoglobin stored in Protein Data Bank and found that heme is distorted along the doming mode, which correlates with its oxygen affinity, more prominently in the protein environment than in the isolated state, and the magnitude of distortion is different between hemoglobin and myoglobin. This tendency was also observed in the results of MD simulations and QM calculations. These results suggest that heme distortion is affected by its protein environment and fluctuates around its fitted conformation, leading to physical properties that are appropriate for protein functions.


2021 ◽  
Vol 8 ◽  
Author(s):  
Paulo C. T. Souza ◽  
Vittorio Limongelli ◽  
Sangwook Wu ◽  
Siewert J. Marrink ◽  
Luca Monticelli

Molecular docking is central to rational drug design. Current docking techniques suffer, however, from limitations in protein flexibility and solvation models and by the use of simplified scoring functions. All-atom molecular dynamics simulations, on the other hand, feature a realistic representation of protein flexibility and solvent, but require knowledge of the binding site. Recently we showed that coarse-grained molecular dynamics simulations, based on the most recent version of the Martini force field, can be used to predict protein/ligand binding sites and pathways, without requiring any a priori information, and offer a level of accuracy approaching all-atom simulations. Given the excellent computational efficiency of Martini, this opens the way to high-throughput drug screening based on dynamic docking pipelines. In this opinion article, we sketch the roadmap to achieve this goal.


2020 ◽  
Vol 36 (10) ◽  
pp. 3064-3071
Author(s):  
Rostislav K Skitchenko ◽  
Dmitrii Usoltsev ◽  
Mayya Uspenskaya ◽  
Andrey V Kajava ◽  
Albert Guskov

Abstract Motivation Halides are negatively charged ions of halogens, forming fluorides (F−), chlorides (Cl−), bromides (Br−) and iodides (I−). These anions are quite reactive and interact both specifically and non-specifically with proteins. Despite their ubiquitous presence and important roles in protein function, little is known about the preferences of halides binding to proteins. To address this problem, we performed the analysis of halide–protein interactions, based on the entries in the Protein Data Bank. Results We have compiled a pipeline for the quick analysis of halide-binding sites in proteins using the available software. Our analysis revealed that all of halides are strongly attracted by the guanidinium moiety of arginine side chains, however, there are also certain preferences among halides for other partners. Furthermore, there is a certain preference for coordination numbers in the binding sites, with a correlation between coordination numbers and amino acid composition. This pipeline can be used as a tool for the analysis of specific halide–protein interactions and assist phasing experiments relying on halides as anomalous scatters. Availability and implementation All data described in this article can be reproduced via complied pipeline published at https://github.com/rostkick/Halide_sites/blob/master/README.md. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document