Predicting the Relative Binding Affinity for Reversible Covalent Inhibitors by Free Energy Perturbation Calculations

Author(s):  
Vinícius Bonatto ◽  
Anwar Shamim ◽  
Fernanda dos R. Rocho ◽  
Andrei Leitão ◽  
F. Javier Luque ◽  
...  
2019 ◽  
Vol 21 (44) ◽  
pp. 24723-24730 ◽  
Author(s):  
Jerônimo Lameira ◽  
Vinícius Bonatto ◽  
Lorenzo Cianni ◽  
Fernanda dos Reis Rocho ◽  
Andrei Leitão ◽  
...  

The free energy perturbation using the covalent and noncovalent states can predict the binding affinity of covalent halogenated dipeptidyl nitrile inhibitors of the human Cathepsin L (hCatL).


2019 ◽  
Author(s):  
Qingyi Yang ◽  
Woodrow W. Burchett ◽  
Gregory S. Steeno ◽  
David L. Mobley ◽  
Xinjun Hou

Predicting binding free energy of ligand-protein complexes has been a grand challenge in the field of computational chemistry since the early days of molecular modeling. Multiple computational methodologies exist to predict ligand binding affinities. Pathway-based Free Energy Perturbation (FEP), Thermodynamic Integration (TI) as well as Linear Interaction Energy (LIE), and Molecular Mechanics-Poisson Boltzmann/Generalized Born Surface Area (MM-PBSA/GBSA) have been applied to a variety of biologically relevant problems and achieved different levels of predictive accuracy. Recent advancements in computer hardware and simulation algorithms of molecular dynamics and Monte Carlo sampling, as well as improved general force field parameters, have made FEP a principal approach for calculating the free energy differences, especially when calculating the host-guest binding affinity differences upon chemical modification.<br><br>Since the FEP-calculated binding free energy difference, denoted ddGFEP only characterizes the difference in free energy between pairs of ligands or complexes, not the absolute binding free energy value of each individual host-guest system, denoted dG, we examine two rarely asked questions in FEP application:<br><br>1) Which values would be more appropriate as the prediction to assess the ligands prospectively: the calculated pairwise free energy differences, ddGFEP, or the estimated absolute binding energies, d^G, transformed from ddGFEP?<br>2) In the situation where only a limited number of ligand pairs can be calculated in FEP, can the perturbation pairs be optimally selected with respect to the reference ligand(s) to maximize the prediction precision?<br><br>These two questions underline the viability of an often-neglected assumption in pairwise comparisons: that the pairwise value is sufficient to make a quantitative and reliable characterization of an individual ligand's properties or activities. This implicit assumption would be true if there was no error in each pairwise calculation. Recently pair designs such as multiple pathways or cycle closure analyses provided calculation error estimation but did not address the statistical impact of the two questions above. The error impact is fully minimized by conducting an exhaustive study that obtains all NC2 = N(N-1)/2 pairs for a set N molecules; more if there is directionality (dGi,j != dGj,i). Obviously, that study design is impractical and unnecessary. Thus, we desire to collect the right amount of data that is 1) feasibly attainable, 2) topologically sufficient, and 3) mathematically synthesizable so that we can mitigate inherent calculation errors and have higher confidence in our conclusions.<br><br>The significance of above questions can be illustrated by a motivating example shown in Figure 1 and Table 1, which considers two different perturbation graph designs for 20 ligands with the same number of FEP perturbation pairs, 19, and the same reference, Ligand 1. These two designs reached different conclusions in rank ordering ligand potencies due to errors inherent in the FEP derived estimates. Based on design A, ligands 5, 7, 14, 15 would be selected as the best four (20%) picks since those d^G estimates are the most favorable. Design B would yield ligands 5, 12, 18, 19 as best for the same reason. Without knowing the true value, dGTrue of the other 19 ligands, we lack a prospective metric to assess which design could be more precise even though, retrospectively, we know that both designs had reasonably good agreement with the true values, as measured through correlation and error metrics. However, the top picks from neither design were consistent with the true top four ligands, which are ligands 7, 10, 12, 18. Yet, if all of the 20C2 =190 pairs could have been calculated as listed in the last column of Table 1, the best four ligands would have been correctly identified. Additionally, the other metrics included in Table 1 were significantly improved. However, as mentioned above, calculating all possible pairs, or even a significant fraction of all possible pairs, is unlikely in practice, especially when number of molecules are large. Given this restriction, is it possible to objectively determine whether design A or B will give more precise predictions?<br><br>In this report, we investigated the performance of the calculated ddGFEP values compared to the pairwise differences in least squares derived d^G estimates both analytically and through simulations. Based on our findings, we recommend applying weighted least squares to transforming ddGFEP values into d^G estimates. Second, we investigated the factors that contribute to the precision of the d^G estimates, such as the total number of computed pairs, the selection of computed pairs, and the uncertainty in the computed ddGFEP values. The mean squared error, denoted MSE and Spearman's rank correlation, are used as performance metrics.<br><br>To illustrate, we demonstrated how the structural similarity can be included in design and its potential impact on prediction precision. As in the majority of reported FEP studies on binding affinity prediction, the ddGFEP pairs were selected based on chemical structure similarity. Pairs with small chemical differences are assumed to be more likely to have smaller errors in ddGFEP calculation. Together using the constructed mathematic system and literature examples, we demonstrate that some of pair-selection schemes (designs) are better than the others. To minimize the prediction uncertainty, it is recommended to wisely select design optimality criterion to suit<br>practical applications accordingly.<br>


2020 ◽  
Vol 60 (11) ◽  
pp. 5563-5579 ◽  
Author(s):  
Francesca Deflorian ◽  
Laura Perez-Benito ◽  
Eelke B Lenselink ◽  
Miles Congreve ◽  
Herman W. T. van Vlijmen ◽  
...  

2021 ◽  
Author(s):  
Alexander Wade ◽  
Agastya Bhati ◽  
Shunzhou Wan ◽  
Peter Coveney

The binding free energy between a ligand and its target protein is an essential quantity to know at all stages of the drug discovery pipeline. Assessing this value computationally can offer insight into where efforts should be focused in the pursuit of effective therapeutics to treat myriad diseases. In this work we examine the computation of alchemical relative binding free energies with an eye to assessing reproducibility across popular molecular dynamics packages and free energy estimators. The focus of this work is on 54 ligand transformations from a diverse set of protein targets: MCL1, PTP1B, TYK2, CDK2 and thrombin. These targets are studied with three popular molecular dynamics packages: OpenMM, NAMD2 and NAMD3. Trajectories collected with these packages are used to compare relative binding free energies calculated with thermodynamic integration and free energy perturbation methods. The resulting binding free energies show good agreement between molecular dynamics packages with an average mean unsigned error between packages of 0.5 $kcal/mol$ The correlation between packages is very good with the lowest Spearman's, Pearson's and Kendall's tau correlation coefficient between two packages being 0.91, 0.89 and 0.74 respectively. Agreement between thermodynamic integration and free energy perturbation is shown to be very good when using ensemble averaging.


2019 ◽  
Author(s):  
Qingyi Yang ◽  
Woodrow W. Burchett ◽  
Gregory S. Steeno ◽  
David L. Mobley ◽  
Xinjun Hou

Predicting binding free energy of ligand-protein complexes has been a grand challenge in the field of computational chemistry since the early days of molecular modeling. Multiple computational methodologies exist to predict ligand binding affinities. Pathway-based Free Energy Perturbation (FEP), Thermodynamic Integration (TI) as well as Linear Interaction Energy (LIE), and Molecular Mechanics-Poisson Boltzmann/Generalized Born Surface Area (MM-PBSA/GBSA) have been applied to a variety of biologically relevant problems and achieved different levels of predictive accuracy. Recent advancements in computer hardware and simulation algorithms of molecular dynamics and Monte Carlo sampling, as well as improved general force field parameters, have made FEP a principal approach for calculating the free energy differences, especially when calculating the host-guest binding affinity differences upon chemical modification.<br><br>Since the FEP-calculated binding free energy difference, denoted ddGFEP only characterizes the difference in free energy between pairs of ligands or complexes, not the absolute binding free energy value of each individual host-guest system, denoted dG, we examine two rarely asked questions in FEP application:<br><br>1) Which values would be more appropriate as the prediction to assess the ligands prospectively: the calculated pairwise free energy differences, ddGFEP, or the estimated absolute binding energies, d^G, transformed from ddGFEP?<br>2) In the situation where only a limited number of ligand pairs can be calculated in FEP, can the perturbation pairs be optimally selected with respect to the reference ligand(s) to maximize the prediction precision?<br><br>These two questions underline the viability of an often-neglected assumption in pairwise comparisons: that the pairwise value is sufficient to make a quantitative and reliable characterization of an individual ligand's properties or activities. This implicit assumption would be true if there was no error in each pairwise calculation. Recently pair designs such as multiple pathways or cycle closure analyses provided calculation error estimation but did not address the statistical impact of the two questions above. The error impact is fully minimized by conducting an exhaustive study that obtains all NC2 = N(N-1)/2 pairs for a set N molecules; more if there is directionality (dGi,j != dGj,i). Obviously, that study design is impractical and unnecessary. Thus, we desire to collect the right amount of data that is 1) feasibly attainable, 2) topologically sufficient, and 3) mathematically synthesizable so that we can mitigate inherent calculation errors and have higher confidence in our conclusions.<br><br>The significance of above questions can be illustrated by a motivating example shown in Figure 1 and Table 1, which considers two different perturbation graph designs for 20 ligands with the same number of FEP perturbation pairs, 19, and the same reference, Ligand 1. These two designs reached different conclusions in rank ordering ligand potencies due to errors inherent in the FEP derived estimates. Based on design A, ligands 5, 7, 14, 15 would be selected as the best four (20%) picks since those d^G estimates are the most favorable. Design B would yield ligands 5, 12, 18, 19 as best for the same reason. Without knowing the true value, dGTrue of the other 19 ligands, we lack a prospective metric to assess which design could be more precise even though, retrospectively, we know that both designs had reasonably good agreement with the true values, as measured through correlation and error metrics. However, the top picks from neither design were consistent with the true top four ligands, which are ligands 7, 10, 12, 18. Yet, if all of the 20C2 =190 pairs could have been calculated as listed in the last column of Table 1, the best four ligands would have been correctly identified. Additionally, the other metrics included in Table 1 were significantly improved. However, as mentioned above, calculating all possible pairs, or even a significant fraction of all possible pairs, is unlikely in practice, especially when number of molecules are large. Given this restriction, is it possible to objectively determine whether design A or B will give more precise predictions?<br><br>In this report, we investigated the performance of the calculated ddGFEP values compared to the pairwise differences in least squares derived d^G estimates both analytically and through simulations. Based on our findings, we recommend applying weighted least squares to transforming ddGFEP values into d^G estimates. Second, we investigated the factors that contribute to the precision of the d^G estimates, such as the total number of computed pairs, the selection of computed pairs, and the uncertainty in the computed ddGFEP values. The mean squared error, denoted MSE and Spearman's rank correlation, are used as performance metrics.<br><br>To illustrate, we demonstrated how the structural similarity can be included in design and its potential impact on prediction precision. As in the majority of reported FEP studies on binding affinity prediction, the ddGFEP pairs were selected based on chemical structure similarity. Pairs with small chemical differences are assumed to be more likely to have smaller errors in ddGFEP calculation. Together using the constructed mathematic system and literature examples, we demonstrate that some of pair-selection schemes (designs) are better than the others. To minimize the prediction uncertainty, it is recommended to wisely select design optimality criterion to suit<br>practical applications accordingly.<br>


Sign in / Sign up

Export Citation Format

Share Document