Accurate Prediction of GPCR Ligand Binding Affinity with Free Energy Perturbation

2020 ◽  
Vol 60 (11) ◽  
pp. 5563-5579 ◽  
Author(s):  
Francesca Deflorian ◽  
Laura Perez-Benito ◽  
Eelke B Lenselink ◽  
Miles Congreve ◽  
Herman W. T. van Vlijmen ◽  
...  
2019 ◽  
Author(s):  
Qingyi Yang ◽  
Woodrow W. Burchett ◽  
Gregory S. Steeno ◽  
David L. Mobley ◽  
Xinjun Hou

Predicting binding free energy of ligand-protein complexes has been a grand challenge in the field of computational chemistry since the early days of molecular modeling. Multiple computational methodologies exist to predict ligand binding affinities. Pathway-based Free Energy Perturbation (FEP), Thermodynamic Integration (TI) as well as Linear Interaction Energy (LIE), and Molecular Mechanics-Poisson Boltzmann/Generalized Born Surface Area (MM-PBSA/GBSA) have been applied to a variety of biologically relevant problems and achieved different levels of predictive accuracy. Recent advancements in computer hardware and simulation algorithms of molecular dynamics and Monte Carlo sampling, as well as improved general force field parameters, have made FEP a principal approach for calculating the free energy differences, especially when calculating the host-guest binding affinity differences upon chemical modification.<br><br>Since the FEP-calculated binding free energy difference, denoted ddGFEP only characterizes the difference in free energy between pairs of ligands or complexes, not the absolute binding free energy value of each individual host-guest system, denoted dG, we examine two rarely asked questions in FEP application:<br><br>1) Which values would be more appropriate as the prediction to assess the ligands prospectively: the calculated pairwise free energy differences, ddGFEP, or the estimated absolute binding energies, d^G, transformed from ddGFEP?<br>2) In the situation where only a limited number of ligand pairs can be calculated in FEP, can the perturbation pairs be optimally selected with respect to the reference ligand(s) to maximize the prediction precision?<br><br>These two questions underline the viability of an often-neglected assumption in pairwise comparisons: that the pairwise value is sufficient to make a quantitative and reliable characterization of an individual ligand's properties or activities. This implicit assumption would be true if there was no error in each pairwise calculation. Recently pair designs such as multiple pathways or cycle closure analyses provided calculation error estimation but did not address the statistical impact of the two questions above. The error impact is fully minimized by conducting an exhaustive study that obtains all NC2 = N(N-1)/2 pairs for a set N molecules; more if there is directionality (dGi,j != dGj,i). Obviously, that study design is impractical and unnecessary. Thus, we desire to collect the right amount of data that is 1) feasibly attainable, 2) topologically sufficient, and 3) mathematically synthesizable so that we can mitigate inherent calculation errors and have higher confidence in our conclusions.<br><br>The significance of above questions can be illustrated by a motivating example shown in Figure 1 and Table 1, which considers two different perturbation graph designs for 20 ligands with the same number of FEP perturbation pairs, 19, and the same reference, Ligand 1. These two designs reached different conclusions in rank ordering ligand potencies due to errors inherent in the FEP derived estimates. Based on design A, ligands 5, 7, 14, 15 would be selected as the best four (20%) picks since those d^G estimates are the most favorable. Design B would yield ligands 5, 12, 18, 19 as best for the same reason. Without knowing the true value, dGTrue of the other 19 ligands, we lack a prospective metric to assess which design could be more precise even though, retrospectively, we know that both designs had reasonably good agreement with the true values, as measured through correlation and error metrics. However, the top picks from neither design were consistent with the true top four ligands, which are ligands 7, 10, 12, 18. Yet, if all of the 20C2 =190 pairs could have been calculated as listed in the last column of Table 1, the best four ligands would have been correctly identified. Additionally, the other metrics included in Table 1 were significantly improved. However, as mentioned above, calculating all possible pairs, or even a significant fraction of all possible pairs, is unlikely in practice, especially when number of molecules are large. Given this restriction, is it possible to objectively determine whether design A or B will give more precise predictions?<br><br>In this report, we investigated the performance of the calculated ddGFEP values compared to the pairwise differences in least squares derived d^G estimates both analytically and through simulations. Based on our findings, we recommend applying weighted least squares to transforming ddGFEP values into d^G estimates. Second, we investigated the factors that contribute to the precision of the d^G estimates, such as the total number of computed pairs, the selection of computed pairs, and the uncertainty in the computed ddGFEP values. The mean squared error, denoted MSE and Spearman's rank correlation, are used as performance metrics.<br><br>To illustrate, we demonstrated how the structural similarity can be included in design and its potential impact on prediction precision. As in the majority of reported FEP studies on binding affinity prediction, the ddGFEP pairs were selected based on chemical structure similarity. Pairs with small chemical differences are assumed to be more likely to have smaller errors in ddGFEP calculation. Together using the constructed mathematic system and literature examples, we demonstrate that some of pair-selection schemes (designs) are better than the others. To minimize the prediction uncertainty, it is recommended to wisely select design optimality criterion to suit<br>practical applications accordingly.<br>


2019 ◽  
Author(s):  
Qingyi Yang ◽  
Woodrow W. Burchett ◽  
Gregory S. Steeno ◽  
David L. Mobley ◽  
Xinjun Hou

Predicting binding free energy of ligand-protein complexes has been a grand challenge in the field of computational chemistry since the early days of molecular modeling. Multiple computational methodologies exist to predict ligand binding affinities. Pathway-based Free Energy Perturbation (FEP), Thermodynamic Integration (TI) as well as Linear Interaction Energy (LIE), and Molecular Mechanics-Poisson Boltzmann/Generalized Born Surface Area (MM-PBSA/GBSA) have been applied to a variety of biologically relevant problems and achieved different levels of predictive accuracy. Recent advancements in computer hardware and simulation algorithms of molecular dynamics and Monte Carlo sampling, as well as improved general force field parameters, have made FEP a principal approach for calculating the free energy differences, especially when calculating the host-guest binding affinity differences upon chemical modification.<br><br>Since the FEP-calculated binding free energy difference, denoted ddGFEP only characterizes the difference in free energy between pairs of ligands or complexes, not the absolute binding free energy value of each individual host-guest system, denoted dG, we examine two rarely asked questions in FEP application:<br><br>1) Which values would be more appropriate as the prediction to assess the ligands prospectively: the calculated pairwise free energy differences, ddGFEP, or the estimated absolute binding energies, d^G, transformed from ddGFEP?<br>2) In the situation where only a limited number of ligand pairs can be calculated in FEP, can the perturbation pairs be optimally selected with respect to the reference ligand(s) to maximize the prediction precision?<br><br>These two questions underline the viability of an often-neglected assumption in pairwise comparisons: that the pairwise value is sufficient to make a quantitative and reliable characterization of an individual ligand's properties or activities. This implicit assumption would be true if there was no error in each pairwise calculation. Recently pair designs such as multiple pathways or cycle closure analyses provided calculation error estimation but did not address the statistical impact of the two questions above. The error impact is fully minimized by conducting an exhaustive study that obtains all NC2 = N(N-1)/2 pairs for a set N molecules; more if there is directionality (dGi,j != dGj,i). Obviously, that study design is impractical and unnecessary. Thus, we desire to collect the right amount of data that is 1) feasibly attainable, 2) topologically sufficient, and 3) mathematically synthesizable so that we can mitigate inherent calculation errors and have higher confidence in our conclusions.<br><br>The significance of above questions can be illustrated by a motivating example shown in Figure 1 and Table 1, which considers two different perturbation graph designs for 20 ligands with the same number of FEP perturbation pairs, 19, and the same reference, Ligand 1. These two designs reached different conclusions in rank ordering ligand potencies due to errors inherent in the FEP derived estimates. Based on design A, ligands 5, 7, 14, 15 would be selected as the best four (20%) picks since those d^G estimates are the most favorable. Design B would yield ligands 5, 12, 18, 19 as best for the same reason. Without knowing the true value, dGTrue of the other 19 ligands, we lack a prospective metric to assess which design could be more precise even though, retrospectively, we know that both designs had reasonably good agreement with the true values, as measured through correlation and error metrics. However, the top picks from neither design were consistent with the true top four ligands, which are ligands 7, 10, 12, 18. Yet, if all of the 20C2 =190 pairs could have been calculated as listed in the last column of Table 1, the best four ligands would have been correctly identified. Additionally, the other metrics included in Table 1 were significantly improved. However, as mentioned above, calculating all possible pairs, or even a significant fraction of all possible pairs, is unlikely in practice, especially when number of molecules are large. Given this restriction, is it possible to objectively determine whether design A or B will give more precise predictions?<br><br>In this report, we investigated the performance of the calculated ddGFEP values compared to the pairwise differences in least squares derived d^G estimates both analytically and through simulations. Based on our findings, we recommend applying weighted least squares to transforming ddGFEP values into d^G estimates. Second, we investigated the factors that contribute to the precision of the d^G estimates, such as the total number of computed pairs, the selection of computed pairs, and the uncertainty in the computed ddGFEP values. The mean squared error, denoted MSE and Spearman's rank correlation, are used as performance metrics.<br><br>To illustrate, we demonstrated how the structural similarity can be included in design and its potential impact on prediction precision. As in the majority of reported FEP studies on binding affinity prediction, the ddGFEP pairs were selected based on chemical structure similarity. Pairs with small chemical differences are assumed to be more likely to have smaller errors in ddGFEP calculation. Together using the constructed mathematic system and literature examples, we demonstrate that some of pair-selection schemes (designs) are better than the others. To minimize the prediction uncertainty, it is recommended to wisely select design optimality criterion to suit<br>practical applications accordingly.<br>


2020 ◽  
Author(s):  
Son Tung Ngo ◽  
Nguyen Minh Tam ◽  
Pham Minh Quan ◽  
Trung Hai Nguyen

COVID-19 pandemic has killed millions of people worldwide since its outbreak in Dec 2019. The pandemic is caused by the SARS-CoV-2 virus whose main protease (Mpro) is a promising drug target since it plays a key role in viral proliferation and replication. Currently, designing an effective therapy is an urgent task, which requires accurately estimating ligand-binding free energy to the SARS-CoV-2 Mpro. However, it should be noted that the accuracy of a free energy method probably depends on the protein target. A highly accurate approach for some targets may fail to produce a reasonable correlation with experiment when a novel enzyme is considered as a drug target. Therefore, in this context, the ligand-binding affinity to SARS-CoV-2 Mpro was calculated via various approaches. The Autodock Vina (Vina) and Autodock4 (AD4) packages were manipulated to preliminary investigate the ligand-binding affinity and pose to the SARS-CoV-2 Mpro. The binding free energy was then refined using the fast pulling of ligand (FPL), linear interaction energy (LIE), molecular mechanics-Poission Boltzmann surface area (MM-PBSA), and free energy perturbation (FEP) methods. The benchmark results indicated that for docking calculations, Vina is more accurate than AD4 and for free energy methods, FEP is the most accurate followed by LIE, FPL and MM-PBSA (FEP > LIE > FPL > MM-PBSA). Moreover, the binding mechanism was also revealed by atomistic simulations. The vdW interaction is the dominant factor. The residues <i>Thr25</i>, <i>Thr26</i>, <i>His41</i>, <i>Ser46</i>, <i>Asn142</i>, <i>Gly143</i>, <i>Cys145</i>, <i>Glu166</i>, and <i>Gln189</i> are essential elements affecting on the binding process. Furthermore, the <i>Ser46</i> and related residues probably are important elements affecting the enlarge/dwindle of the SARS-CoV-2 Mpro binding cleft. The benchmark probably guide for further investigations using computational approaches.


2019 ◽  
Author(s):  
Filip Fratev ◽  
suman sirimulla

Recent improvements to free energy perturbation (FEP) calculations, especiallyFEP+, established their utility for pharmaceutical lead optimization. However, to dateFEP has typically been helpful only when (1) high-quality X-ray data is available and(2) the target protein does not undergo significant conformational changes. Also, alack of systematic studies on determining an adequate sampling time is often one ofthe primary limitations of FEP calculations. Herein, we propose a modified versionof the FEP/REST (i.e., replica exchange with solute tempering) sampling protocol,based on systematic studies on several targets by probing a large number of permutations with different sampling schemes. Improved FEP+ binding affinity predictions for regular flexible-loop (F-loop) motions and considerable structural changes can be obtained by extending the pre-REST sampling time from 0.24 ns to 5 ns/λand 2×10 ns/λ, respectively. We obtained much more precise ∆∆G calculations of the individual perturbations, including the sign of the transformations and less error. We extended the REST simulations from 5 ns to 8 ns to achieve reasonable free energy convergence.Implementing REST to the entire ligand as opposed to solely the perturbed region, and also some important flexible protein residues (pREST region) in ligand binding domain (LBD) , also considerably improved the FEP+ results in most of the studied cases. Preliminary molecular dynamics (MD) runs were useful for establishing the correct binding mode of the compounds and thus precise alignment for FEP+.<br>


2019 ◽  
Vol 41 (7) ◽  
pp. 611-618 ◽  
Author(s):  
Son Tung Ngo ◽  
Trung Hai Nguyen ◽  
Nguyen Thanh Tung ◽  
Pham Cam Nam ◽  
Khanh B. Vu ◽  
...  

2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Filip Fratev ◽  
Suman Sirimulla

AbstractRecent improvements to the free energy perturbation (FEP) calculations, especially FEP+ , established their utility for pharmaceutical lead optimization. Herein, we propose a modified version of the FEP/REST (i.e., replica exchange with solute tempering) sampling protocol, based on detail studies on several targets by probing a large number of perturbations with different sampling schemes. Improved FEP+ binding affinity predictions for regular flexible-loop motions and considerable structural changes can be obtained by extending the prior to REST (pre-REST) sampling time from 0.24 ns/λ to 5 ns/λ and 2 × 10 ns/λ, respectively. With this new protocol, much more precise ∆∆G values of the individual perturbations, including the sign of the transformations and decreased error were obtained. We extended the REST simulations from 5 ns to 8 ns to achieve reasonable free energy convergence. Implementing REST to the entire ligand as opposed to solely the perturbed region, and also some important flexible protein residues (pREST region) in the ligand binding domain (LBD) has considerably improved the FEP+ results in most of the studied cases. Preliminary molecular dynamics (MD) runs were useful for establishing the correct binding mode of the compounds and thus precise alignment for FEP+ . Our improved protocol may further increase the FEP+ accuracy.


2014 ◽  
Vol 119 (3) ◽  
pp. 824-835 ◽  
Author(s):  
Dahlia A. Goldfeld ◽  
Robert Murphy ◽  
Byungchan Kim ◽  
Lingle Wang ◽  
Thijs Beuming ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document