Phenix/DivCon Usage & Tutorial

Phenix/DivCon Usage & Tutorial

The qbphenix Command Line arguments cover various standard functions and features available within the toolbox. If you are looking for XModeScore tools and options, please see the XModeScore options. Otherwise, the documentation below will provide you with a starting point for X-ray crystallography using QM and QM/MM based methods.

(If you would like to be notified when these other functions are documented, email support@quantumbioinc.com to register your interest).

Method Summary

The Phenix package is used as a command-line based tool using the phenix.refine executable as the primary driver. The phenix-online.org Documentation site provides a great starting point for the use of PHENIX, and direct links for the tools of interest are provided below. Use of Phenix/DivCon involves the following key steps and a tutorial is also provided below to illustrate the use of these steps in an actual refinement. The method is itself discussed at length in the following paper (available as a free download from the associated link):

  • Borbulevych, O. Y., Plumley, J. A., Martin, R. I., Merz, K. M., Jr, & Westerhoff, L. M. (2014). Accurate macromolecular crystallographic refinement: incorporation of the linear scaling, semiempirical quantum-mechanics program DivConinto the PHENIXrefinement package. Acta Crystallographica Section D, Biological Crystallography, 70(5), 1233–1247. http://doi.org/10.1107/S1399004714002260

Determination of the experimental data: A discussion of this subject is beyond the scope of the manual, and the interested reader is referred both to the PHENIX Documentation site and the myriad of field specific references on the subject. Going forward, it is assumed that you have an MTZ or other PHENIX-compatible structure factor format available.

Initial model building and ligand placement: If your desire is to re-refine a PDB structure, then much of this initial placement has been handled for you; however, the refinement is ultimately a gradient-based method very much akin to structure minimization. With that in mind, your starting geometry will drive your refinement, and huge changes to the structure from its starting geometry aren’t expected. Since the packaged PHENIX/DivCon process does not include docking, dynamics, or other placement algorithms as part of the refinement process, if you have alternate conformations or “docked” ligand poses you wish to explore, you should plan to run separate QM-based refinements for each of these poses. PHENIX provides a number of tools to help in this process as detailed in the Documentation. Finally, in addition to PHENIX tools, you may also use AFIT from OpenEye for initial placement of the ligand.

Structure Protonation: Once you have initial placement, protons will need to be added to the structure. Since quantum mechanics-based refinement is an all-atom method, reasonable protonation is required prior to commencing the simulation. Phenix provides all of the tools necessary to prepare the structure with the ReadySet! tool which is exhibited in the tutorial below in order to add the protons to the entire structure – including any waters. The ReadySet! package adds protons and makes sure that all atom names of added atoms correspond to the database PHENIX uses to set stereochemical restraints. Since the phenix.refine engine is being used to drive the refinement, we are limited to the error traps provided by PHENIX (such as atom names, and so on) and it is highly recommended the Documentation is followed for ReadySet!. Tools are also provided in order to run Protonate3D as implemented in MOE/batch from the Chemical Computing Group, Inc.

CIF File Preparation: Next, Phenix’s eLBOW program is used to create the CIF for the ligand. Since the ligand stereochemical parameters are replaced with QM gradients during the refinement at each microcycle, the quality of CIF parameters is less important when using QuantumBio’s Phenix plugin; however, Phenix itself does require that the CIF be provided as part of its error checking. Further, in the event that you wish to use the macro_cycle_to_skip= Phenix/DivCon command line option, the quality of the CIF may become more important since the CIF stereochemical restraints will be used early in the refinement. In any case, eLBOW does most of the work for you.

Execution of phenix.refine: The DivCon plug-in is designed to treat a QM region of the structure, and generally, this region is made up of the ligand and the surrounding receptor structure (e.g. the active site). It is this core QM region in which the QM gradients are substituted for the conventional stereochemical restraint gradients during the QM-refinement. In addition to the QM region, you may optionally provide an additional buffer region in order to “insulate” the QM region from the conventional stereochemical restraint extremities of the protein/ligand complex. All of the atoms within this buffer region are also treated quantum mechanically, however any QM-gradients generated for this buffer region are thrown out at each microcyle. Both the QM region and the buffer region dimensions are communicated to the DivCon package through the qblib_region_radius= and qblib_buffer_radius= command line options respectively. The center of the QM region – most often the ligand – is defined using a basic language based on the PHENIX selection language entered in the qblib_region_selection= command line option. Multiple selection regions – such as ligand copies, cofactors, etc – can be applied in any refinement using the same selection language.

Once the above steps have been performed, the refinement should progress without incident. If the qblib_buffer_radius= and qblib_region_radius= sizes are set appropriately, the entire QM-treated region should have hundreds or perhaps a thousand atoms or so. The refinement itself will run through any number of macro- and microcycles as determined by Phenix. All of these options can be manipulated as per the Phenix Documentation, and since phenix.refine is being used as the driver of the refinement, DivCon simply does as its instructed. QuantumBio-developed Phenix bindings manage all of the communication between the two tool sets and no further user-instantiated communication is required. Depending upon the speed for your processor, the size of the QM region, and the number of macrocycles requested, this simulation will probably take a few hours. Note: it is highly recommended that you choose to run on multiple processor cores (2, 4, or more) using the qblib_np= command line option.

Execution of qbphenix workflow script: In order to improve the workflow of protonation, CIF generation, and subsequent QM refinement, the various steps noted above have been joined in a script found at $QBHOME/bin/qbphenix in the install package. Currently, the script uses standard, phenix tools for all of these functions. If other functions are preferred for certain steps – such as using OpenEye’s Babel for ligand protonation – the interested user is able to edit the Perl code found in the $QBHOME/perl/qbphenix.pl file. Tools are also provided in order to run Protonate3D as implemented in MOE/batch from the Chemical Computing Group, Inc.

Troubleshooting: In the event that the refinement does experience difficulty such as a problem with convergence or the like, this problem will be communicated to you through the phenix.refine log file. The plug-in has been designed to be resilient to occasional convergence problems, however very poor structures will cause larger, systemic problems in convergence and the refinement will then end with an error. The macro_cycle_to_skip=N command line option is provided in order to skip the initial N macrocycles. This option will instruct Phenix to use the conventional stereochemical restraints to perform an initial cleaning of the structure prior to QM-based refinement. This option should also significantly speed up the refinement since significant errors will be addressed using the conventional restraints. The macro_cycle_to_skip=N command line option is turned off by default and entering a value of N=1 may be prudent – especially for work early in the refinement cycle.

Back to Top

Detailed Tutorials

Tutorial #1 (Running Phenix/DivCon without the qbphenix wrapper): 1IG3

Beginning with the data available on the RCSB Protein Data Bank PDBid:3IX1 page the following calculations were performed.

  1. Source the proper Phenix and DivCon versions (assuming Bash shell):
    source /path/to/phenix-1.9-xxxx/phenix_env.sh
    source /path/to/DivConDiscoverySuite/etc/qbenv.sh
  2. In a clean, empty directory, place the 1IG3.pdb (without hydrogens as downloaded from PDB) and 1ig3-sf.cif files for treatment in the next step.
  3. Execute phenix.ready_set to prepare and protonate structure:
    phenix.ready_set 1IG3.pdb add_h_to_water=true
     Key files genererated by phenix.ready_set include:
        1IG3.updated.pdb 1IG3.ligands.cif
  4. Execute phenix.refine with the following commandline options:
    phenix.refine 1IG3.updated.pdb 1ig3-sf.cif 1IG3.ligands.cif qblib=True       
    qblib_method=pm6 qblib_region_selection="chain A and resname VIB and resid 502"
    qblib_region_radius=3.0 qblib_buffer_radius=3.0 qblib_np=4

Back to Top

Tutorial #2a (using qbphenix execution script and MOE on single ligand): 3IX1 Beginning with the data available on the RCSB Protein Data Bank PDBid:3IX1 page the following calculations were performed.

  1. Source the proper Phenix and DivCon versions (assuming Bash shell):
    source /path/to/phenix-1.9-xxxx/phenix_env.sh
    source /path/to/DivConDiscoverySuite/etc/qbenv.sh
  2. In a clean, empty directory, place the 3IX1.pdb (without hydrogens as downloaded from PDB) and 3ix1-sf.cif files for treatment in the next step. Note that qbphenix can download the PDB and structure factors files automatically using the –pdbID command line option.
  3. Execute the QM region refinement on 4 processors in PHENIX with the command (phenix.elbow can take several minutes to run).
    $QBHOME/bin/qbphenix --dataFile 3ix1-sf.cif --pdbFile 3IX1.pdb --selection "chain A resname NFM resid 401" 
    --phenixOptions "main.number_of_macro_cycles=1" --protonation MOE --qmMethod pm6 --region-radius 3.0
    --buffer-radius 2.5 --Nproc 4

Back to Top

Tutorial #2b (using qbphenix execution script and phenix.ready_set on single ligand): 3IX1 Beginning with the data available on the RCSB Protein Data Bank PDBid:3IX1 page the following calculations were performed.

  1. Source the proper Phenix and DivCon versions (assuming Bash shell):
    source /path/to/phenix-1.9-xxxx/phenix_env.sh
    source /path/to/DivConDiscoverySuite/etc/qbenv.sh
  2. In a clean, empty directory, place the 3IX1.pdb (without hydrogens as downloaded from PDB) and 3ix1-sf.cif files for treatment in the next step. Note that qbphenix can download the PDB and structure factors files automatically using the –pdbID command line option.
  3. Execute the QM region refinement on 4 processors in PHENIX with the command (phenix.elbow can take several minutes to run).
    $QBHOME/bin/qbphenix --dataFile 3ix1-sf.cif --pdbFile 3IX1.pdb --selection "chain A resname NFM resid 401"
     --phenixOptions "main.number_of_macro_cycles=1" --protonation ReadySet  --qmMethod pm6 --region-radius 3.0 
    --buffer-radius 2.5  --Nproc 4

Back to Top

Tutorial #3 (using qbphenix execution script and MOE on all ligands): 3IX1 Beginning with the data available on the RCSB Protein Data Bank PDBid:3IX1 page the following calculations were performed.

  1. Source the proper Phenix and DivCon versions (assuming Bash shell):
    source /path/to/phenix-dev-1.9-xxxx/phenix_env.sh
    source /path/to/DivConDiscoverySuite-b####/etc/qbenv.sh
  2. In a clean, empty directory, place the 3IX1.pdb (without hydrogens as downloaded from PDB) and 3ix1-sf.cif files for treatment in the next step. Note that qbphenix can download the PDB and structure factors files automatically using the –pdbID command line option.
  3. Execute the QM region refinement on 4 processors in PHENIX with the command (phenix.elbow can take several minutes to run).
    $QBHOME/bin/qbphenix --dataFile 3ix1-sf.cif --pdbFile 3IX1.pdb --selection "resname NFM"
     --phenixOptions "main.number_of_macro_cycles=1" --protonation MOE  --qmMethod pm6 --region-radius 3.0
    --buffer-radius 2.5  --Nproc 4  

Back to Top

Tutorial #4 (using qbphenix execution script and MOE): 1LRI Beginning with the data available on the RCSB Protein Data Bank PDBid:1LRI page the following calculations were performed.

  1. Source the proper Phenix and DivCon versions (assuming Bash shell):
    source /path/to/phenix-1.9-xxxx/phenix_env.sh
    source /path/to/DivConDiscoverySuite/etc/qbenv.sh
  2. In a clean, empty directory, place the 1LRI.pdb (without hydrogens as downloaded from PDB) and 1lri-sf.cif files for treatment in the next step. Note that qbphenix can download the PDB and structure factors files automatically using the –pdbID command line option.
  3. Execute the QM region refinement on 4 processors in PHENIX with the command (phenix.elbow can take several minutes to run).
    $QBHOME/bin/qbphenix --dataFile 1lri-sf.cif --pdbFile 1LRI.pdb --selection "chain A resname CLR resid 99" 
    --phenixOptions "main.number_of_macro_cycles=1" --protonation MOE --qmMethod pm6 --region-radius 3.0
    --buffer-radius 2.5 --Nproc 4

Back to Top

Tutorial #5: ONIOM (QM/MM) Based X-ray Refinement and Phenix Clash Score Analysis In Fall 2016, QuantumBio added QM/MM or ONIOM-based X-ray refinement support to the DivCon Discovery Suite. This functional includes a combination of the AMBER force field (parameter years 1998-2014) with either the AM1, PM3, or PM6 semiempirical QM Hamiltonian. The qbphenix script is able to use this functionality with a “flip of a [command line] switch.” The ONIOM implementation in the Suite is fully automated in the sense that all atom typing, region selection, capping, and finally QM-based parameter assignment for non-standard residues and ligands are performed completely automatically within the qmechanic software. Support for multiple QM regions and even truncated residues is included as well. In addition to these functional improvements, the qbphenix script automatically performs several before/after-refinement analyses on the structures. You can use these analyses, provided in the standard out of the script, to measure the improvement of the structure after QM- and QM/MM-based X-ray refinement. These analyses include the following:

  • Ligand Strain (SE) – generally, the ligand strain should decrease, often significantly, after more accurate X-ray refinement. The higher the strain, the greater the chance that something has been placed incorrectly within the active site. High strain can also indicate incorrect protonation states, bad chemical interactions, and so on. In our work, a decrease of several-fold is not unheard of.
  • ZDD – ZDD or the Z-score of the difference density of the ligand(s) is a measure of the accuracy of the model (ligand XYZ coordinates) vs. the experimental density. As with ligand strain, the lower the ZDD the better. However, with QM/MM methods, the ZDD will not always decrease. This is because with more advanced energy functionals, a ligand can be pushed into and out of density based upon the chemistry in the active site. If the ZDD increases, it is usually an indicator that you have provided an incorrect binding mode or protonation state. Running XModeScore prior to running X-ray refinement can help answer this question.
  • MOEScore – This score, which is only provided when MOE is available on the command line, is a measure of the binding affinity as calculated by MOE for each ligand within the selection. This value may go up or down and directionality is not in itself an indicator of improvement. What is a measure of improvement is how well a set of newly refined structures match experimental binding affinity. This improvement can be observed by looking at a group of MOEScores for a set of related structures (e.g. a congeneric series or other similar series) and comparing these scores with the corresponding experimental binding affinities.
  • ClashScore – as reported by phenix.molprobity, the ClashScore provides an all atom contact analysis of the structure. The lower the score the better the model. QM/MM-based X-ray refinement usually leads to much lower strain and the ClashScore reflects this improvement. Additional information is available on the Phenix website.

Contact support@quantumbioinc.com for examples of the use to these metrics. For this tutorial, example input is available for download.

  1. Source the proper Phenix and DivCon versions (assuming Bash shell):
    source /path/to/phenix-dev-1.11-xxxx/phenix_env.sh
    source /path/to/DivConDiscoverySuite-7.1.0-b####/etc/qbenv.sh
  2. In a clean, empty directory, place the 1NAV.pdb (without hydrogens as downloaded from PDB) and 1nav-sf.cif files for treatment in the next step. Note that qbphenix can download the PDB and structure factors files automatically using the –pdbID command line option.
  3. Execute the QM region refinement on 4 processors in PHENIX with the command.
    $QBHOME/bin/qbphenix --pdbID 1NAV --mmMethod amberff14sb --qmMethod pm6 --selection "resname IH5" --np 4
        --region-radius 3.0 --buffer-radius 0.0 --protonation MOE >& OUT.screen
    cat OUT.screen 
    Command Line Options are:
    <<<
    --pdbID 1NAV 
    --protonation MOE 
    --mmMethod amberff14sb 
    --qmMethod pm6 
    --selection resname IH5 
    --np 4 
    --buffer-radius 0.0 
    --region-radius 3.0 
    <<<
    
    1nav ... 1nav.pdb.gz ... 1nav-sf.cif.gz
    http://files.rcsb.org/download/1nav.pdb.gz
    http://files.rcsb.org/download/1nav-sf.cif.gz
    ....
    Process files
    Protonating the structure using MOE ...........
    
    ....
    
    wall clock time: 3987.95 s
    
    Start R-work = 0.2373, R-free = 0.2416
    Final R-work = 0.2073, R-free = 0.2520
    
    Calculating QBPHENIX Refinement Statistics ........
    
    =========================== QM Refinement: Summary ============================
    Ligand     SE    SE    MOE Score  MOE Score  ZDD    ZDD  ClashScore  ClashScore
             Start  Final    Start      Final   Start  Final    Start       Final
    IH5_A_600  36.97   8.6   -6.41511  -7.79769   7.2    0.7     11.0       1.0
    
    

     

Back to Top

Tutorial #6 (using qbphenix execution script and MOE on covalently bound ligand): 3NCK Beginning with the data available on the RCSB Protein Data Bank PDBid:3NCK page the following calculations were performed. This example is provided to underscore the importance of checking your work. Modern software – such as MOE, DivCon, etcetera – use advanced perception algorithms in order to determine protonation states, charge, and so on. In some cases, such as in this one, the starting structure is so distorted that the perception algorithm can not correctly determine a key piece of information OR the investigator may wish to try alternatives that the perception algorithm may miss. In this case, MOE does not correctly determine that there should be a bond between the SER and the NFF and therefore two additional protons are added. You may delete these protons within a visualizer or you can edit the resulting PDB file as shown below:

  1. Source the proper Phenix and DivCon versions (assuming Bash shell):
    source /path/to/phenix-1.9-xxxx/phenix_env.sh
    source /path/to/DivConDiscoverySuite/etc/qbenv.sh
  2. In a clean, empty directory, place the 3NCK.pdb (without hydrogens as downloaded from PDB) and 3nck-sf.cif files for treatment in the next step. Note that qbphenix can download the PDB and structure factors files automatically using the –pdbID command line option.
  3. Execute the QM region refinement on 4 processors in PHENIX with the command (phenix.elbow can take several minutes to run).
    $QBHOME/bin/qbphenix --dataFile 3nck-sf.cif --pdbFile 3NCK.pdb --selection "resname NFF" 
     --phenixOptions "main.number_of_macro_cycles=1" --protonation MOE  --qmMethod pm6 --region-radius 3.0
     --buffer-radius 2.5  --Nproc 4  --scriptName run
  4. Edit the resulting (e.g. new) 3NCK.pdb file in a text editor and remove the following two hydrogen atoms:
    HETATM 3842  H1  NFF A   1     -11.551 -12.749   5.647  1.00 41.25           H
    HETATM 3844  H3  NFF A   1     -11.646 -10.395   3.784  1.00 40.92           H
  5. ./run

 

Back to Top

Tutorial #7 (Running Phenix/DivCon on internal repository on queing system-managed cluster) The qbphenix script is able to treat a directory (e.g. local or internal repository) full of PDB and MTZ files. The provided refinementSetup.pl wrapper script takes both the list of structures – and included ligand(s) – within the repository and the phenix_template.pbs file as input. You should edit the phenix_template.pbs file as required for your environment. The refinementSetup.pl script may need to be edited in order to correctly submit jobs to your PBS, SGE, etc environment. Contact support@quantumbioinc.com for help. For this tutorial, example input is available for download.

  1. Set repository directory in phenix_template.pbs and edit the file as required for your environment (e.g. SGE, etc). For example:
     repoDIR=${PWD}/example/repo
    Also fill in the path information both to phenix and to the DivCon Discovery Suite. The phenix_template.pbs is documented to note these requirements.
  2. In addition to the batch script template file, the refinementSetup.pl script requires a list file to communicate the PDB base names within the repository you wish to treat along with the 3-letter ligand code for the ligand(s) of interest.
    1LRI,CLR
    1FK7,RCL
  3. Finally, once both of these input files have been prepared, the refinementSetup.pl script is called to process the input and qsub all of the jobs to your queueing system. If your system uses an executable other then qsub to submit jobs, then you will need to edit the script accordingly.
    $QBHOME/scripts/refinementSetup.pl ./list ./phenix_template.pbs

Back to Top

Tutorial #8: Running Phenix/DivCon coupled with 3DRISM to determine novel crystal water locations (BETA) PHENIX/DivCon can be coupled with MOE/batch, 3DRISM, and Coot in order to determine novel, crystal water sites using a combination of the 3DRISM computational chemistry method, QM/MM crystallographic refinement, and conventional sigma-based location determination. At its core, the method uses 3DRISM to “filter out” the noise in the conventional method. Please note: this approach requires newer versions of MOE/batch (which have 3DRISM) and a version of Coot produced in the last year. If you have any difficulty with this BETA module, Contact Us for support. To run the software, follow these steps:

  1. Source the proper Phenix and DivCon versions (assuming Bash shell). Again, recent versions of MOE/batch and Coot must also be found on the command line and errors will be thrown if they are not found.
    source /path/to/phenix-1.9-xxxx/phenix_env.sh
    source /path/to/DivConDiscoverySuite/etc/qbenv.sh
  2. Make initial preparation of the structure. This generally includes protonation and this command will generate three files: 2WO7+H.pdb (protonated PDB), 2WO7.cif and 2WO7.mtz. As in previous tutorials, the selection is the 3letter code of the ligand, and the –pdbid option will automatically download the 2W07.pdb file from the RCSB. You may use –pdbfile if you have your own file. This step still will also perform a quick refinement on the fully prepared structure using conventional phenix.refine in order to generate an X-ray difference map (files 2WO7+H_refine_001.mtz and 2WO7+H_refine_001.pdb which will be used on the next stage). % $QBHOME/bin/qbphenix –pdbId 2WO7 –protonation MOE –resname ASV –autoBuildMoe –nproc 4 –ncycles 1 –qbOff
  3. Run PHENIX command to make a binary CCP4 maps. This calculation will produce the difference density CCP4 file 2WO7+H_refine_001_mFo-DFc.ccp4.
    phenix.mtz2map  2WO7+H_refine_001.pdb 2WO7+H_refine_001.mtz
  4. Run qbphenix to generate water positions and add them to an output PDB file using the following command.
    $QBHOME/bin/qbphenix --pdbFile 2WO7+H_refine_001.pdb --dataFile 2WO7+H_refine_001_mFo-DFc.ccp4
    --rismWater --xSigma 2.4 --rSigma 8.0 --dir RismTest
     Where: –rismWater request to find water positions using Phenix/DivCon. –xSigma 2.4 – choose a sigma to filter out X-ray density (default 2.5) –rSigma 8.0 – choose a sigma to filter out RISM density (default 9.0) –dir RismTest – run the job in the folder ‘RismTest’.
    $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
    $                                                                	$
    $                    	PHENIX JOB MESSAGE                      	$
    $                                                                	$
    $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
    
     Making Binary RISM file 2WO7+H_refine_001-output.rism using rism3d program...
              	*** IT MAY TAKE UP TO 5-10 HOURS ***
    
    $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

Upon conclusion of the simulation (which may take several hours due to the expense of 3DRISM), the file ‘2WO7_refine_001-withAddedWaters.pdb’ will have the found water set.

Back to Top


Manual Contents

Other Resources