Abstract: We describe a novel knowledge-based protein−ligand scoring function that employs a new definition for the reference state, allowing us to relate a statistical potential to a Lennard-Jones (LJ) potential. In this way, the LJ potential parameters were generated from protein−ligand complex structural data contained in the Protein Databank (PDB). Forty-nine (49) types of atomic pairwise interactions were derived using this method, which we call the knowledge-based and empirical combined scoring algorithm (KECSA). Two validation benchmarks were introduced to test the performance of KECSA. The first validation benchmark included two test sets that address the training set and enthalpy/entropy of KECSA. The second validation benchmark suite included two large-scale and five small-scale test sets, to compare the reproducibility of KECSA, with respect to two empirical score functions previously developed in our laboratory (LISA and LISA+), as well as to other well- known scoring methods. Validation results illustrate that KECSA shows improved performance in all test sets when compared with other scoring methods, especially in its ability to minimize the root mean square error (RMSE). LISA and LISA+ displayed similar performance using the correlation coefficient and Kendall τ as the metric of quality for some of the small test sets. Further pathways for improvement are discussed for which would allow KECSA to be more sensitive to subtle changes in ligand structure.
Authors: Z. Zheng and K. M. Merz, Jr.