The impact of QM/MM X-ray and Cryo-EM refinement in structure-based drug discovery

Workshop Opportunity:
Are you interested in learning more about the technologies presented in this article? Will you be attending the American Crystallography Association (ACA) conference this summer (2023) in Baltimore, Maryland? Sign up for “Workshop #4: Best Practices of Quantum-Mechanics (QM) driven Macromolecular Refinement” (when you register for the conference) and plan to attend an all-day, hands-on event focused on the application, use, and real-world impact of QM and QM/MM-based X-ray and Cryo-EM methods on drug discovery. More details to follow. Please feel free to Contact Us if you would like more information.

Thanks in large part to their speed and lower cost, virtual screening, docking, and scoring have become integral to the drug discovery process and critical tools in the structure-based drug design (SBDD) toolbox. However, these methods are critically dependent upon the availability of accurate protein:ligand models from X-ray crystallography and cryogenic electron microscopy (cryo-EM).

Therefore X-ray crystallography, and to a certain extent cryo-EM, have become relatively routine in SBDD due to advances in data collection, processing, structure solution, and refinement automation. Yet, there are limitations in these traditional SBDD tools that leave scientists with the question, “How can we obtain more accurate experimental protein:ligand models in order to enhance our SBDD efforts?”

It turns out, there are many opportunities for improvement: non-crystallographers are often surprised to learn that conventional X-ray/cryo-EM methods have very little support for the non-bonded interactions – such as hydrogen bonds, electrostatics, pi-pi, and van der Waals – along with metal coordination, ions, covalently bound ligands, and other exotic chemical situations which typify drug discovery efforts.

Quantum mechanics (QM), molecular mechanics (MM), and mixed-QM/MM-based methods provide the tools required to better characterize and treat these sorts of situations, but more often than not, in conventional computer-aided drug design, these methods are used after the structural biology effort is complete and a PDB file has been solved. QuantumBio software merges QM, MM, and QM/MM methods with experimental density – including through plugins to both PHENIX and BUSTER – in order to provide the “best of both worlds.” With these tools, we address all of these situations and we even use these methods (coupled with statistical methods) to provide tools to analyze the experimental data to determine protomer/tautomer states, binding mode rotamer states, and chirality.

In recent work, we have demonstrated that these deficiencies in conventional methods directly impact our ability to use these structures in SBDD, and the results of this work are discussed in detail in the OpenAccess paper in the Journal of Computer-Aided Molecular Design.

The Structure-Based Drug Design Toolbox

Drug R&D relies on compound synthesis and rigorous screening. However, it is expensive and impractical for a company to make thousands (or millions) of compounds of unknown efficacy as each compound costs time and resources. Therefore, pharmaceutical companies rely on crystallographers and computational chemists to identify promising drug designs prior to synthesis. To tackle this task, these organizations use a structural biology toolbox to refine crystal structures which will be used in computational model building via computer-aided drug design (CADD) and to inform lead optimization (which successful campaigns will in turn increase the likelihood of synthesizing a compound that will be useful to the project).

The goal is to accurately identify those compounds that bind to the therapeutic target so they are prioritized for synthesis and bioassays. However, the current SBDD toolbox frequently leaves crystallographers and CADD users guessing structural properties during their structural refinement process. This translates to frustrated crystallographers and computational chemists, and millions of dollars and years of effort wasted both on compounds destined to fail and on those missed or incorrectly excluded.

The SBDD Toolbox Weak Link

In conventional crystallography, instead of relying on modern MM or QM functionals, practitioners use stereochemical restraints (including target bond angles, lengths, etcetera) and Crystallographic Information File (CIF) libraries to represent compounds of interest in the active site. CIFs are biased by what we believe the compounds look like and the parameters within these files are generally based on the unbound ligand conformation even though successful refinement is dependent upon accurate bound or in situ atom types, torsions, bond angles, and bond lengths.

For example, consider the isoalloxazine ring for flavin adenine dinucleotide (FAD) from PDBid:1SIQ refined at 2.1 Å represented below. In this instance, the original structure (shown in yellow) adopts a completely planar conformation while the QM/MM refined structure (shown in green) is bent. Given the positive difference density shown for the original structure, we know that the bent conformation – which does not exhibit this difference density – is the likely position. However, since the CIF relies on fixed bond angles/torsions (and planarity), obtaining a bent conformation from conventional refinement would be difficult (especially if one wasn’t sure a priori if the bent conformation was correct).

Further, conventional refinement misses key protein:ligand intermolecular and intramolecular terms—electrostatics, polarization, charge transfer, and even van der Waals and hydrogen bonds—resulting in significant atomic coordinate uncertainties and structural errors.

Because of these missing terms, when computational chemists and medicinal chemists gain access to these models, they must make significant alterations to the protein:ligand structure (e.g. add protons, optimize the structure, dynamics, and so on) with little regard to whether those modifications are reflected in the actual experimental data. These models then become the structures that are used to generate binding affinity predictions or scores to explore critical protein:ligand interactions during lead optimization.

Since conventional refinement does not include support for many of the critical chemical interactions which drive SBDD, there are several types of errors that can arise:

  • False positives: these are interactions that the model shows to exist but which are weak or non-existent when treated with a higher order method (e.g. perhaps a situation in which a ligand binding mode is chosen by the practitioner to “fit” an expected H-bond but which falls apart when treated with QM, MM, or QM/MM).
  • False negatives: conversely, these are interactions that could be captured by the ligand in question if only the ligand were placed correctly in the active site. Often false negatives and false positives conspire to lead to binding modes which fit the experimental density but which don’t fit the chemical composition of the active site.
  • Incorrect bond lengths, angles, and torsions lead to higher strain conformations that are impractical in real life

False negatives and in particular, false positives can be extremely expensive: if you think your ligand captures or addresses an interaction that it actually doesn’t, you might believe that your lead optimization effort for the associated residue is complete. Unfortunately, with structures obtained through conventional refinement, since your understanding is misguided, you could spend a substantial amount of time optimizing the wrong part of the ligand. When you have a true understanding of the interactions, you have an opportunity to make more informed decisions and ultimately a more promising drug candidate.

For example, the macrocycle DOL in PDBid:1MRL refined at 2.8 Å. The yellow structure is the published model and the green structure is the one refined using QM/MM.

When we decompose the interactions within this model using MOE 2022, we observe that there are at least two false positives in the published structure (hydrogen bonds between ASP_B38:DOL_O and TYR_A54:DOL_O) which the macrocycle doesn’t capture and one false negative (hydrogen bond between ASN_A92:DOL_O) which the ligand does.

Life science companies using SBDD, CADD, and structural biology (X-ray crystallography and Cryo-EM) need to eliminate the guesswork in structural refinement to generate more accurate crystal structures and identify the most promising drug candidates.

Build a comp chem ↔ structural bio feedback loop

Ideally, you want a perfect correlation between your prediction and experimental binding affinity. We’ve created a new way for crystallographers and CADD users to work together to build better models and make better predictions. When we replace the highly approximate stereochemical gradients with a QM/MM (quantum mechanics/molecular mechanics) functional—creating a feedback loop between computational chemistry and structural biology – the two fields become closely aligned in that the structures which the crystallographer generates fit both the density and the chemistry.

Here’s how it works:

  1. The CIF along with any other stereochemical restraints are replaced with the QM/MM functional using your platform of choice (e.g., Phenix or Buster) coupled with QuantumBio’s DivCon plugin—nothing else in the refinement toolbox changes.
  2. You perform X-ray/Cryo-EM refinement per your usual workflow, but now you’ll see clear density maps where structural issues exist within your target so you can explore and correct them prior to sending structures to computational chemists—or the computational chemists can do this refinement themselves.
  3. Following computational modeling, you can easily identify any affinity outliers, which will indicate where your target and ligand need further refinement, so you can further improve your structures and generate more accurate predictions.
  4. MOE Users: optionally, all of these features are available with an integrated X-ray crystallography plugin available for the Molecular Operating Environment (MOE) from Chemical Computing Group, Inc. within the DivCon package.

Since QM/MM refinement addresses electrostatics, polarization, charge transfer, H-bonding, and more—and it is executed on bound protein:ligand conformations – this process takes slightly longer than conventional refinement; however, it saves countless hours that would have otherwise been dedicated to suboptimal compounds that were prioritized based on inaccurate lead optimization efforts.

Through several validation sets, we’ve demonstrated that QM/MM-refined crystal structures generate significantly more accurate binding affinity predictions. For example, QM/MM vs. conventional X-ray refinement using the CSAR validation we have shown:

  • The MolProbity score and clashscore (an indicator of the overall quality of protein target structures) for QM/MM refined structures is, on average, 2 × lower (better) than after the conventional refinement.
  • These QM/MM refined structures exhibit on average a ~3-fold improvement in ligand strain versus conventionally refined structures.
  • QM/MM refinement leads to an appropriate shift and rotation of ligands (absent in conventional refinement), enabling us to properly correct the model.
  • Given the significance of protons, with XModeScore we’ve shown that with QM/MM refinement, we can “see” them and explore these critical interactions which are completely disregarded in conventional refinement methods.
  • Finally, QM/MM X-ray crystallographic refinement leads to significantly improved correlations between experimental binding affinity and computationally predicted GBVI/WSA scores compared to conventional methods.

QM/MM X-ray models are not only better representations of protein:ligand structures, but they are more chemically descriptive of the key interactions drug R&D teams need to understand for lead optimization and to develop efficacious therapeutics.

In order to illustrate, we’ve demonstrated in a recent publication that when considering the CDK2 target of the CSAR set – which includes 15 protein:ligand X-ray structures – the correlation between the predicted binding affinity (according to the GBVI/WSA score function in MOE) and the experimental binding affinity improves markedly from R2=0.25 in the conventionally refined structures to R2=0.60 based on QM/MM re-refinement alone.

Then, when additional analyses (such as XModeScore – discussed below) are brought to bear to improve our understanding of protonation, the R2 increases further to 0.73. These higher correlations clearly show that our better structures lead to better predictions (and conversely, if the predictions are poor, that likely means that the experimental model is suspect).

Tautomers, protomers, rotamers, and isomers

We now routinely use linear scaling quantum mechanics and mixed-QM/MM in X-ray crystallographic refinement to replace troublesome stereochemical restraint gradients used in conventional methods with much more accurate QM/MM gradients. The method leads to structures that exhibit lower ligand strain, improved understanding of key protein:ligand interactions, and better binding affinity predictions.

You can take refinement a step further with XModeScore, which facilitates proton and binding mode scoring. With XModeScore, different protomer, tautomer, isomer, and/or rotamer states or binding modes are generated and analyzed using a combination of QM and QM/MM-based refinement and X-ray or Cryo-EM density to determine which state(s) is(are) found within the active site. This method can be applied to both the ligand and the active site or any residue(s) of your choice. Therefore, rigorous refinement, functional, and statistical analyses lead to actionable intelligence: understanding of structure leads to better scoring results and ultimately better models and better predictions. In the data presented above, for example, we see that XModeScore improves the CDK2 X-ray models and our associated predictions using the industry standard GBVI/WSA score function.

Ready to level up your SBDD toolbox and generate more accurate binding affinity predictions? Contact us for a demo or attend our upcoming ACA2023 Workshop.

Discover more: 

QuantumBio offers a powerful suite of innovative software products purpose-built for life sciences on cutting-edge science that utilizes the highest levels of theory available to achieve peak accuracy, performance, and versatility. Through our science-first, customer-centric approach, we make precision quantum mechanical approaches more user-friendly, cost-effective and easily accessible. We help pharmaceutical, biotech, and academic scientists improve their understanding of biochemical structure and function while enhancing the drug discovery process.

Get the latest news on our SBDD and CADD solutions – Join our Mailing List