MovableType: Four stages to improve binding free energy predictions

MovableType: Four stages to improve binding free energy predictions

Get your ligands and proteins to fit just right by increasing target sampling.

Using MovableType to predict free binding energies

The MovableType (MT) method is a fast, free energy method that is in the same vein as free energy perturbation (FEP), thermodynamic integration (TI), and Molecular Mechanics Poisson-Boltzmann Surface Area (MMPBSA)—but without the required expense of molecular dynamics (MD).

Instead, the MT method couples two approaches to estimate the bound and unbound partition functions and ultimately the binding free energies:

  1. A novel local sampling approach, for numerically estimating the conformational local free energies (e.g. “blurring” or “smearing”)
  2. A global sampling approach such as docking

Docking is fast and is often quite accurate. But there is no “free lunch” with MT and the rule of “garbage-in/garbage-out” still applies. To understand how to improve your results with MT, first we need to discuss the challenges in computer-aided drug design.

Challenges in Computer-Aided Drug Design

In computer-aided drug design, we are often limited to the structural characteristics of the model chosen by the experimentalists. Rotamers are chosen based on which best fit the density, but due to limitations in how well we can crystallographically resolve heavy atoms, it is difficult to be absolutely sure which rotamer is correct. For instance, oxygens, nitrogens, and carbons are almost indistinguishable—especially at the resolutions we generally have access to in structure-based drug discovery campaigns.

Adding to the challenge is the fact that protonation states are difficult to resolve without the use of QM/MM refinement and XModeScore, which further limits our reliance on experimental structures to nothing more than an initial guide. Finally, and most obviously, the poses in a crystal structure are only fixed “snapshots in time” and binding events include structural transformations (and hence probability distributions) which often can’t be accurately gleaned from crystal structures alone.

So how do we accurately and efficiently model molecular binding to design drugs that can become our best lead candidates? It comes down to increasing target as well as ligand sampling.

When to Consider Increasing Target Sampling with MovableType

Traditionally, MT docking was used almost exclusively for “global sampling,”and our results have shown that ensembles of docked poses yield binding affinities that are often quite accurate. But what happens when docking+MT doesn’t lead to accurate binding affinities? 

This could be caused by something as simple as inaccurate binding poses or—perhaps more likely—a lack of good models to capture movement in the protein structure as a ligand binds or dissociates. 

In standard docking regimes, the target structure is kept rigid with little or no movement. But we know that in “real life,” amino acid (and ligand) rotamers flip upon binding, target loop regions shift as the structure “opens” and “closes”, and active site and ligand protons shift from one position to another as tautomers are sampled. Certainly, if binding modes (generated from docking) yield acceptable results for your purposes, then use them: they are quickly generated and often they are quite effective. But if the results are not “up to snuff” (or as predictive as you want), then you should consider more sampling. 

With this year’s additions to the MovableType workflow, you are able to provide bound-target poses as well as bound-ligand poses to qmechanic in order to include more target-level global sampling models (protein:ligand complexes). 

  • This is accomplished with the use of a target PDB file and a ligand mol2 file:
    % qmechanic target.pdb --ligand placed_ligand.mol2 \
    		--mtdock complex_snapshot_1.pdb complex_snapshot_2.pdb ….. \
    		--mtscore ensemble -v 2 --np 2 -p pdb
  • It can also be accomplished through the use of a selection in the PDB file (which in this example will select the LIG three-letter code in the A chain as the ligand):
    % qmechanic complex.pdb --ligand /A/LIG// \
    		--mtdock complex_snapshot_1.pdb complex_snapshot_2.pdb ….. \
    		--mtscore ensemble -v 2 --np 2 -p pdb

     

Stages to Increase Target Sampling

To help you get your molecular modeling projects on track, we’ve mapped out four core setup methods you can progress through to get to the accurate binding free energies you need. Depending on your specific project, you might be able to get the results you need after the first stage (Rigid-Receptor/Flexible-Ligand Docking), or you may need to move through to the second, third, or fourth stage to properly simulate protein:ligand binding. While the computational cost does increase with each stage, it is well worth it to get accurate models that can save you time in the long run.

Stage 1: Rigid-receptor/flexible-ligand docking: no target sampling during docking

Rigid-receptor/flexible-ligand docking is the easiest and most efficient method to simulate protein:ligand landscape minima for treatment in MT. As noted above, MT isn’t a free lunch and some sort of landscape minimum or preferably minima need to be provided to MT for local sampling. In conventional MT calculations, we generate approximately 25 ligand-poses for each ligand and we minimize each docked ligand conformation within the active site, but we keep the target rigid during this time and calculate the binding affinity for each ligand based on this set of poses.

Benefits:

  • It’s fast. Given that there are no changes in the target upon binding, we can skip calculating the unbound target partition function and the bound-target partition function and focus on the intermolecular partition function.
  • When we view the structures, we can simply compare them to X-ray structures to get a feel for which could be filtered or left out of the calculation.

Limitations:

  • Targets don’t really act like this in the test tube or cell. Clearly, this approximation is great when it works, and it is always a good first try because it is so fast. But we should be prepared to do some additional sampling.
  • We have a single “shot on goal” so if the target structure is incorrect (with a poorly placed rotamer, or questionable loop region), there’s simply no way of knowing.

Stage 2: Flexible-Receptor/Flexible-Ligand Docking: [minimal] target sampling during docking

The most obvious modification to our rigid-receptor/flexible-ligand workflow is to use flexible-target/flexible-ligand docking (i.e., Induced-fit docking). While the conventional protocol often works, sometimes it does not and there are cases when flexible-receptor/flexible-ligand docking is more predictive.

The setup for induced-fit docking is usually analogous to the rigid-receptor protocol, but instead we simply “switch on” induced-fit docking which places the ligand in the starting “fixed” active site and refines the target:ligand to (hopefully) more accurately capture protein:ligand interactions.

The comparison between rigid receptor and induced fit is found in our recent publication in the Journal of Chemical Information and Modeling titled MovableType Software for Fast Free Energy-Based Virtual Screening: Protocol Development, Deployment, Validation, and Assessment.

Benefits:

  • While it isn’t quite as fast as rigid-receptor/flexible ligand docking, it is still pretty fast.
  • Unlike rigid-receptor methods, induced-fit will yield minimized target:ligand interfaces that account for the interactions between these species.

Limitations:

  • Any calculated flexible-target/flexible-ligand models are completely dependent upon the starting conformation of the target.
  • These target:ligand poses are not only dependent upon the starting conformation, they likely won’t diverge appreciably from that starting location. Therefore, large target movements won’t be captured with this approach.

Stage 3: Modified target docking: increased target sampling during docking

So how do we address the limitations found in rigid-receptor docking and induced-fit docking? Given the limited radius of convergence of optimization, placement becomes critical and modifications to the target will circumvent this limit and lead to different pose sets.

Furthermore, since MT is able to use multiple target:ligand poses to estimate partition functions (and by extension binding affinities), you can use one or even all of the following techniques to increase the accuracy of your predictions. And because MT is fast, you can do this quickly and easily:

  • Cross docking: Generally, when we choose a target model for novel ligand docking, we choose the target which has a bound ligand which most closely matches our novel compound. One of the benefits of MT is that you can literally use more than one target structure in your score. If your target of interest includes more than one published bound-ligand structure, then dock your novel compounds to each available target and include them all in the MTScoreE calculation.
  • Residue rotamer sampling: Thanks to both disorder and our inability to differentiate between N’s, O’s, and C’s in X-ray structures, we don’t fully resolve rotamer states. Further, since these states are chosen based on how they fit in a crystalline form of the active site (instead of the one in solvent), it makes sense to try different rotamer states when running induced-fit docking.
  • Protonation state sampling: Tautomeric/protomeric forms are (generally) completely unknown from experimental models. In fact, more often than not, protons aren’t even included in the refinement process. Therefore, if you aren’t sure which state is correct, try several tautomer/protomer states. The newest MT packages (available summer 2021) support variable protonation states on both the ligand and in the active site.
  • Apo (unbound-target) structure sampling: finally, with most recent additions to the MovableType package, the –apo command line option has been added to the available repertoire of tools. Traditionally, the holo target poses generated for rigid-receptor, induced-fit, and the above noted modifications are used for the unbound poses as well. However, we know this is an approximation because a target often moves upon binding—hinges open or close, rotamers flip, waters move in and out, and so on. With the apo command line option, you can provide unbound-target poses to MT. These apo structures could be supplied or generated through apo X-ray / NMR / Cryo-EM structures, ligand-independent optimization, ligand-independent molecular dynamics, or some other mechanism of your choosing.
    % qmechanic complex.pdb --ligand /A/LIG// \ --apo target_snapshot_1.pdb target_snapshot_2.pdb …. \ --mtdock complex_snapshot_1.pdb complex_snapshot_2.pdb ….. \ --mtscore ensemble -v 2 --np 2 -p pdb

 

Stage 4: Molecular dynamics: increased target sampling beyond docking

Up to this point, each of the protocols still rely on docking (either rigid-receptor or induced-fit) and you can provide any number of target:ligand snapshots to DivCon as separate files (e.g., complex_snap_1.pdb, complex_snap_2.pdb and so on). But there is no reason to stop there.

With the support for induced-fit (target) sampling added to the MovableType method, you can go beyond docking and include molecular dynamics snapshots in your protocol—either in place of or in addition to those produced using the above docking methods.

Now, with molecular dynamics-driven conformational distributions, you can use the techniques developed over the last 30+ years in academic and industrial laboratories to increase protein sampling. Of course dynamics will increase computational cost, but if that computational cost yields better results it may be worth it.

For example, when considering the PFKFB3 set in the MCompChem “fep-benchmark” produced by Schindler et al, the R2 improves from 0.003 observed for rigid-receptor docking plus MT to 0.46 when the same structures are subjected to 250 ns of MD (followed by clustering for snapshot selection)! For the CMET set from the same benchmark, the R2’s remain similar with and without MD (0.6-0.7), but the slope improves from 0.1 to 0.4. So all in all, MD+MT is often able to “tease out” additional features that regular rigid-docking+MT may miss.

Final Thoughts

Computer-aided drug design is a critical process for generating breakthrough therapeutics, however it comes with several challenges and complexities—one of which is efficiently and accurately modeling your target:ligand complexes. While MovableType can quickly deliver binding predictions you can trust, it may require some optimization and finesse to get your simulations running properly. Using the four steps outlined here can help transform a “failed” model into an accurate molecular binding engine, and our team of experts is always here to help get you on a path to success!

Stay up to date with QuantumBio and updates to MovableType by joining our mailing list!