Guidelines: X-ray Refinement

Guidelines: X-ray Refinement

Summary

With its greater rigor and methodological robustness captured in the removal of various “empirical restraints,” QM-based refinement can take the intellectual guess work out of conventional refinement. As with any method, there are some usage guidelines that will help you to be more productive. When influencing the progress of your QM-based refinement, it is important to keep the following bullet points in mind:

  • Preparation is important. We have taken as many steps as possible to use MOE/Protonate3D and other tools to help in structure preparation. The concept of junk-in/junk-out is something to watch out for, and for this reason, we recommend the use of advanced tools such as those available within MOE or Maestro.
  • If your available processor count is on the lower end (e.g. 1-, 2-, or 4-cores), you should try to keep the buffer+core+ligand QM region as small as possible without trading too much in the way of accuracy. Use the tables below as a guideline.
  • However, if you have access to many processor-cores, you may want to make your buffer+core+ligand QM region larger in order to better capture the greater, thread-based parallelism available in DivCon6.
  • Finally, QM-based refinement is ultimately an optimization of a starting geometry. In some cases it may be advantageous to provide different starting geometries and run several, independent refinements. Upon completion of the refinement, the final ligand strain is also prepared and this data can be used as a metric in order to inform your choice between the various refinements.

Details

X-ray refinement is an indispensable tool in structure based drug discovery. Just about every pharmaceutical endeavor in the world makes use of x-ray structures in one way or another in order to inform the drug discovery process, and x-ray refinement is the method used to determine these structures. Unfortunately, the energy potential used in the conventional x-ray refinement process is fairly simple and lacks any recognition of a number of physical properties such of electrostatics and quantum effects. In order to make up for these limitations, conventional refinement methods require a significant amount of a priori knowledge of the ligand and target/ligand structure in the form of stereochemical restraints, link restraints, coordination, atom types, and so on (e.g. empirical restraints). Therefore, the effort required to refine a structure is quite high – especially for ligand structure (which is the bread and butter of the field), metal-containing active sites, bound species, and other exotic chemical situations. Quantum mechanics (QM) on the other hand does not suffer from any of these problems. QM-based x-ray refinement is demonstrably more rigorous and more methodologically robust then conventional methods, especially when it comes to ligand and active site structure. However, while the user is able to spend less of his or her “intellectual time” exploring various empirical restraints, the increased rigor and methodological robustness of QM-based refinement does not come without some cost in the form of additional CPU time. The purpose of this page is to discuss what you can do to minimize that cost while maximizing the benefit of using a more physically complete model.

DivCon is a linear scaling QM package for which the CPU time required to characterize a structure scales linearly with the number of atoms. For “single point” calculations, this is almost always the case. As shown in figure 1, the relationship is even fairly (though imperfectly) linear for x-ray refinement. The influences on scaling will be discussed below. Generally the smaller the number of atoms or residues the faster the calculation will perform. In order to treat large target:ligand structures, QuantumBio has provided a mechanism to focus the method on just the part(s) of the structure of interest for more rigorous treatment (e.g. the ligand or cofactor and the surrounding residues). This method is illustrated in the tables below in which the top row of each table lists the sizes of both the core QM region and the buffer region (e.g. denoted as core/buffer in Angstroms). The core region includes all residues around the ligand and the buffer region is made up of residues surrounding the core region. Both regions along with the ligand are treated quantum mechanically, however only the positions of the atoms within the core+ligand region are influenced by the QM calculation. So for these examples, the core/buffer sizes range from the larger end of 5.0A/2.5A to the smallest end of 0.0A/0.0A. For the 0.0A/0.0A cases, no additional atoms other then the ligand are included in the simulation. To provide a point of reference, the total numbers of atoms or residues are listed in the table.

Header: BETA-CRYPTOGEIN-CHOLESTEROL COMPLEX
Resolution: 1.45 Ã…

divcon6/PM6 np=2

1LRI/CLR L1 5.0/2.5 4.0/2.5 3.0/2.5 2.5/2.5 2.0/2.5 0.0/2.5 0.0/0.0
Time,h 2.83 (1.83*) 2.56 (1.84*) 2.01 (1.49*) 1.32 (1.14*) 0.42 (0.30*) 0.20 (0.21*) 0.16 (0.17*)
Rwork 0.163 0.162 0.162 0.161 0.159 0.157 0.157
Rfree 0.175 0.175 0.175 0.175 0.174 0.173 0.173
QMAtomCount 1285 1139 1126 900 353 195 74
QMResidueCount 93 79 79 55 19 8 1
RMSD (from 5.0/2.5) 0 0.001 0.004 0.005 0.04 0.05 0.05
Strain Energy 7.80 7.69 7.34 6.73 4.60 3.20 2.41

* time for np=4

Header: CRYSTAL STRUCTURE OF HUMAN TR BETA LIGAND-BINDING DOMAIN COMPLEXED WITH A POTENT SUBTYPE-SELECTIVE THYROMIMETIC
Resolution: 2.20 Ã…

divcon6/PM6 np=2

1N46/LIG A 462 5.0/2.5 4.0/2.5 3.0/2.5 2.5/2.5 2.0/2.5 0.0/2.5 0.0/0.0
Time,h 4.99 (3.38*) 3.68 (2.5*) 3.59 (2.44*) 2.48 (1.78*) 1.03 (1.03*) 0.41 (0.41*) 0.28 (0.29*)
Rwork 0.171 0.171 0.17 0.17 0.17 0.171 0.172
Rfree 0.194 0.197 0.194 0.196 0.197 0.195 0.196
QMAtomCount 1259 937 921 804 351 144 34
QMResidueCount 100 76 74 64 26 9 1
RMSD (from 5.0/2.5) 0 0.02 0.03 0.03 0.07 0.07 0.07
Strain Energy 4.36 3.86 4.24 3.80 5.30 2.99 1.59

* time for np=4

Header: BOVINE TRYPSIN-INHIBITOR COMPLEX
Resolution: 2.20 Ã…

divcon6/PM6 np=2

1K1J/LIG X 999 5.0/2.5 4.0/2.5 3.0/2.5 2.5/2.5 2.0/2.5 0.0/2.5 0.0/0.0
Time,h 8.85 (5.52*) 3.78 (2.89*) 1.76 (1.63*) 0.97(0.87*) 0.44 (0.39*) 0.36 (0.35*) 0.29 (0.30*)
Rwork 0.149 0.148 0.148 0.146 0.143 0.141 0.142
Rfree 0.179 0.182 0.179 0.182 0.182 0.183 0.183
QMAtomCount 907 799 667 513 224 181 68
QMResidueCount 77 66 52 37 15 11 1
RMSD (from 5.0/2.5) 0 0.02 0.02 0.05 0.07 0.08 0.09
Strain Energy 35.13 36.10 34.09 32.54 33.37 32.86 24.47

* time for np=4

Header: KINETICS AND CRYSTAL STRUCTURE OF THE HERPES SIMPLEX VIRUS TYPE 1 THYMIDINE KINASE INTERACTING WITH (SOUTH)-METHANOCARBA-THYMIDINE
Resolution: 1.95 Ã…

divcon6/PM6 np=2

1OF1/LIG A 400 5.0/2.5 4.0/2.5 3.0/2.5 2.5/2.5 2.0/2.5 0.0/2.5 0.0/0.0
Time,h 4.68 (3.17*) 2.81 (1.92*) 3.07 (2.20*) 2.11 (1.6*) 0.68 (0.65*) 0.52 (0.51*) 0.44 (0.43*)
Rwork 0.171 0.171 0.17 0.17 0.17 0.171 0.172
Rfree 0.194 0.197 0.194 0.196 0.197 0.195 0.196
QMAtomCount 1259 937 921 804 351 144 34
QMResidueCount  100 76 74 64  26  9  1
RMSD (from 5.0/2.5) 0 0.02 0.03 0.03 0.07 0.07 0.07
Strain Energy 4.36 3.86 4.24 3.80 5.30 2.99 1.59

* time for np=4

Header: Crystal structure of human adipocyte fatty acid binding protein in complex with a carboxylic acid ligand
Resolution: 2.00 Ã…

divcon6/PM6 np=2

1TOW/CRZ X 501 5.0/2.5 4.0/2.5 3.0/2.5 2.5/2.5 2.0/2.5 0.0/2.5 0.0/0.0
Time,h 2.62 (1.76*) 2.07 (1.39*) 1.55 (1.07*) 0.74 (0.61*) 0.31 (0.29*) 0.13 (0.13*) 0.12 (0.12*)
Rwork 0.171 0.172 0.171 0.169 0.168 0.168 0.167
Rfree 0.2 0.199 0.201 0.204 0.208 0.209 0.209
QMAtomCount 1281 1135 1029 667 368 115 33
QMResidueCount 99 87 79 48 24 8 1
RMSD (from 5.0/2.5) 0 0.008 0.01 0.03 0.06 0.07 0.08
Strain Energy 6.79 7.23 6.89 6.06 6.83 4.18 3.95

* time for np=4

What Can You Do?

In almost all cases, the size of the QM region does indeed impact the speed of the calculation. As the examples illustrate, you can complete the refinement in minutes if you only include the ligand within the refinement (as in the 0.0/0.0 examples). If you make the region too small however, then you may miss potentially crucial protein/ligand interactions. As the size of the QM region shrinks, the structural root mean square deviation (RMSD) of the ligand vs. the 5.0A/2.5A structure generally gets larger suggesting that surrounding protein structure (e.g. active size residues, cofactors, etc) all also impact ligand structure. This observation is further supported by the fact that the ligand strain also decreases as the QM region size shrinks. At first blush this result may seem surprising. However, local ligand strain actually captures not only the ligand but also the influence of the environment on the ligand and the experimental data that Phenix uses in its refinement process. In this way, the conformation of the ligand interacting with the protein will be slightly different then the conformation the ligand may adopt outside of the active site. As the QM region gets smaller, chemically the ligand will “see” less and less of the protein, and therefore it will begin to adopt the local minimum conformation of the free ligand. For this reason, it is suggested that a larger region is used for X-ray refinement.

As depicted in Figure 1, the scaling is imperfect. In order to better understand why, consider the internal steps within the refinement process. When Phenix and DivCon are working in conjunction with one another, at each micro-cycle, Phenix creates a new set of coordinates based on a combination of gradients formed from the structure factors, the empirical restraints for the protein outside of the core+ligand QM region, and the QM gradients from the core+ligand region. These new coordinates are fed back into DivCon and again the buffer+core+ligand QM region is treated, and the gradients are again returned to Phenix for incorporation, and on and on until all of the macro- and micro-cycles are exhausted. Each call to DivCon at each micro-cycle can take one of three possible paths:

  • If the Phenix-provided structure in a micro-cycle is acceptable, then the QM calculation will quickly converge and provide gradients for the core+ligand region. In fact, if the structure for a current micro-cycle is similar to but different then the structure in the previous micro-cycle, then the previous density matrix is automatically used as a guess in order to further speed up the calculation.
  • If the Phenix-provided structure in a micro-cycle is effectively identical to the structure from the previous micro-cycle, which can happen as the refinement begins to converge, the new QM calculation is automatically skipped and the gradients from the previous QM calculation are sent back to Phenix. This cost-cutting feature can significantly decrease the amount of time required for refinement for a well-behaved system.
  • If the Phenix-provided structure in a micro-cycle is truly horrendous for some reason, the QM calculation can either completely fail or it can take a long time to converge. As the refinement proceeds, the Phenix-provided structures tend to get better so this happens less frequently. However, if there is a protonation or other significant structural problem, it can certainly happen more often which will lead to greater expense in the calculation.

All three of these possibilities can happen in a particular refinement and are generally beyond the control of the user. If you want to speed up the calculation, you could allow the conventional refinement to progress through the first macro-cycle; however, you should also expect to then have the same problems with the empirical restraints of the ligand:target site as you would have had during a full conventional refinement. Likewise, you can check and double-check the structure prior to refinement to verify that the tools you used to protonate and build the structure did their jobs effectively.

Another way to speed up the calculation is to increase the number of processors available to the calculation. In the tables above, both the quad- and the dual-processor calculations are presented for all of the DivCon6 results. DivCon6 is parallelized for multiple threads, and the softwre will automatically divide the calculation on a “residue-by-residue basis” according to the np=X request presented by the user. As the QM region grows, the scaling of the calculation will also increase because as the number of residues increases, the opportunity for parallelizing the system also increases. Again, parallel-scaling for single point DivCon calculations is linear up to 8- or even 16-cores where 8-cores will get you to your results almost 8x faster. For refinement, due to the same complexity noted above, we do not see perfect 2x scaling when moving from 2- to 4-cores. As depicted in Figure 2, we could theoretically see a 2x scaling if we include a much larger percentage of the system.

One more thing: DivCon5 vs. DivCon6

As depicted in the tables below, in the first two examples, dual-processor calculations involving both DivCon5 and DivCon6 are also compared with one another. DivCon6 is several times faster then DivCon5 – especially for larger QM regions. In addition to CPU time, DivCon6 is also built from the ground-up for PM6 (and beyond), which is a newer, more powerful semiempirical QM Hamiltonian that supports 70 elements. DivCon5 on the other hand is built for AM1 and PM3. Going forward, only DivCon6 will be available for the refinement effort.

 

Header: BETA-CRYPTOGEIN-CHOLESTEROL COMPLEX
Resolution: 1.45 Ã…

divcon6/PM6 np=2

1LRI/CLR L1 5.0/2.5 4.0/2.5 3.0/2.5 2.5/2.5 2.0/2.5 0.0/2.5 0.0/0.0
Time,h 2.83 (1.83*) 2.56 (1.84*) 2.01 (1.49*) 1.32 (1.14*) 0.42 (0.30*) 0.20 (0.21*) 0.16 (0.17*)
Rwork 0.163 0.162 0.162 0.161 0.159 0.157 0.157
Rfree 0.175 0.175 0.175 0.175 0.174 0.173 0.173
QMAtomCount 1285 1139 1126 900 353 195 74
QMResidueCount 93 79 79 55 19 8 1
RMSD (from 5.0/2.5) 0 0.001 0.004 0.005 0.04 0.05 0.05
Strain Energy 7.80 7.69 7.34 6.73 4.60 3.20 2.41

divcon5/AM1 np=2

1LRI/CLR L1 5.0/2.5 4.0/2.5 3.0/2.5 2.5/2.5 2.0/2.5 0.0/2.5 0.0/0.0
Time,h 16.35 12.86 11.74 5.87 1.69 0.20 0.19
Rwork 0.158 0.158 0.157 0.157 0.154 0.153 0.153
Rfree 0.173 0.175 0.174 0.175 0.174 0.173 0.174
QMAtomCount 1264 1099 1105 547 228 77 74
QMResidueCount 88 79 75 32 13 2 1
RMSD (from 5.0/2.5) 0 0.008 0.004 0.005 0.02 0.04 0.05
Strain Energy 8.14 7.05 8.45 7.17 5.54 4.92 4.09

* time for np=4

Header: CRYSTAL STRUCTURE OF HUMAN TR BETA LIGAND-BINDING DOMAIN COMPLEXED WITH A POTENT SUBTYPE-SELECTIVE THYROMIMETIC
Resolution: 2.20 Ã…

divcon6/PM6 np=2

1N46/LIG A 462 5.0/2.5 4.0/2.5 3.0/2.5 2.5/2.5 2.0/2.5 0.0/2.5 0.0/0.0
Time,h 4.99 (3.38*) 3.68 (2.5*) 3.59 (2.44*) 2.48 (1.78*) 1.03 (1.03*) 0.41 (0.41*) 0.28 (0.29*)
Rwork 0.171 0.171 0.17 0.17 0.17 0.171 0.172
Rfree 0.194 0.197 0.194 0.196 0.197 0.195 0.196
QMAtomCount 1259 937 921 804 351 144 34
QMResidueCount 100 76 74 64 26 9 1
RMSD (from 5.0/2.5) 0 0.02 0.03 0.03 0.07 0.07 0.07
Strain Energy 4.36 3.86 4.24 3.80 5.30 2.99 1.59

divcon5/AM1 np=2

1N46/LIG A 462 5.0/2.5 4.0/2.5 3.0/2.5 2.5/2.5 2.0/2.5 0.0/2.5 0.0/0.0
Time,h 34.25 28.07 21.34 10.86 1.97 0.60 0.39
Rwork 0.181 0.1811 0.179 0.182 0.181 0.177 0.174
Rfree 0.232 0.233 0.237 0.237 0.239 0.238 0.235
QMAtomCount 1390 1241 1111 729 359 134 48
QMResidueCount 90 78 70 45 21 6 1
RMSD (from 5.0/2.5) 0 0.02 0.018 0.02 0.05 0.06 0.06
Strain Energy 5.75 6.28 4.48 6.22 6.86 5.48 4.32

* time for np=4

Phenix/DivCon Manual Contents