Residue naming issue in protein preparation #291

junsuhas · 2024-12-18T05:20:21Z

Hello!
Thank you for updating Meeko.
I find it very useful.

If the information in your PDB is slightly incorrect or you are experiencing other issues with the new update of Protein Preparation, we have two questions about errors.

For protein preparation, I used mk_prepare_receptor.py -p -a.

The two problems are very similar.

first

The error occurs when a residue name that should say HIP is listed as HIS in error.
Is it possible to be more flexible about this?

example1

ATOM      1  N   HIS A  80     -19.182   3.555 -50.520  1.00 40.54      A    N
ATOM      2  CA  HIS A  80     -20.420   4.309 -50.356  1.00 38.93      A    C
ATOM      3  C   HIS A  80     -20.163   5.653 -49.665  1.00 33.31      A    C
ATOM      4  O   HIS A  80     -19.466   5.716 -48.663  1.00 31.17      A    O
ATOM      5  CB  HIS A  80     -21.418   3.486 -49.548  1.00 43.69      A    C
ATOM      6  CG  HIS A  80     -22.760   4.130 -49.417  1.00 47.85      A    C
ATOM      7  ND1 HIS A  80     -23.040   5.072 -48.449  1.00 49.15      A    N+
ATOM      8  CD2 HIS A  80     -23.899   3.975 -50.135  1.00 49.65      A    C
ATOM      9  CE1 HIS A  80     -24.294   5.469 -48.577  1.00 50.66      A    C
ATOM     10  NE2 HIS A  80     -24.838   4.818 -49.591  1.00 50.72      A    N
ATOM     11  HA  HIS A  80     -20.804   4.494 -51.227  1.00 46.71      A    H
ATOM     12  N   MET A  81     -20.727   6.731 -50.200  1.00 31.47      A    N
ATOM     13  CA  MET A  81     -20.401   8.057 -49.695  1.00 31.31      A    C
ATOM     14  C   MET A  81     -20.977   8.304 -48.302  1.00 26.17      A    C
ATOM     15  O   MET A  81     -22.055   7.826 -47.949  1.00 26.47      A    O
ATOM     16  CB  MET A  81     -20.953   9.131 -50.637  1.00 38.05      A    C
ATOM     17  CG  MET A  81     -20.241   9.228 -51.981  1.00 44.13      A    C
ATOM     18  SD  MET A  81     -20.621  10.770 -52.840  1.00 49.59      A    S
ATOM     19  CE  MET A  81     -19.560  11.912 -51.948  1.00 50.16      A    C
ATOM     20  H   MET A  81     -21.294   6.720 -50.847  1.00 37.76      A    H
ATOM     21  HA  MET A  81     -19.434   8.124 -49.652  1.00 37.57      A    H
ATOM     22  HB2 MET A  81     -21.886   8.934 -50.815  1.00 45.65      A    H
ATOM     23  HB3 MET A  81     -20.873   9.994 -50.202  1.00 45.65      A    H
ATOM     24  HG2 MET A  81     -19.282   9.190 -51.837  1.00 52.95      A    H
ATOM     25  HG3 MET A  81     -20.521   8.490 -52.544  1.00 52.95      A    H
ATOM     26  HE1 MET A  81     -19.660  12.798 -52.329  1.00 60.19      A    H
ATOM     27  HE2 MET A  81     -19.822  11.921 -51.014  1.00 60.19      A    H
ATOM     28  HE3 MET A  81     -18.640  11.618 -52.031  1.00 60.19      A    H

second

If a residue name other than the specified residue comes in, an error is thrown.

For HIC, it proceeds with a new Template built, but for GL3, it errors with unknown residue.
I would like to see GL3 ignored, or atomized.

example2

ATOM  14072  CA  GL3 D 490      -4.796  -9.409   7.119  1.00  8.29      D    C
ATOM  14073  N   GL3 D 490      -6.056  -9.828   6.521  1.00  9.18      D    N
ATOM  14074  C   GL3 D 490      -4.010 -10.484   7.809  1.00  8.71      D    C
ATOM  14075  S   GL3 D 490      -2.854 -10.062   8.727  1.00  9.13      D    S
ATOM  14076  N   HIC D 491      -4.082 -11.712   7.249  1.00  8.34      D    N
ATOM  14077  CA  HIC D 491      -3.436 -12.866   7.852  1.00  8.80      D    C
ATOM  14078  C   HIC D 491      -1.928 -12.871   7.591  1.00 10.10      D    C
ATOM  14079  O   HIC D 491      -1.172 -13.168   8.506  1.00  9.98      D    O
ATOM  14080  CB  HIC D 491      -3.986 -14.118   7.156  1.00  8.46      D    C
ATOM  14081  CG  HIC D 491      -3.462 -15.443   7.706  1.00  8.97      D    C
ATOM  14082  ND1 HIC D 491      -4.227 -16.441   8.219  1.00  9.51      D    N
ATOM  14083  CD2 HIC D 491      -2.153 -15.905   7.731  1.00  8.48      D    C
ATOM  14084  CE1 HIC D 491      -3.427 -17.468   8.559  1.00  8.58      D    C
ATOM  14085  NE2 HIC D 491      -2.167 -17.162   8.225  1.00  8.99      D    N
ATOM  14086  CZ  HIC D 491      -0.968 -18.020   8.468  1.00  9.05      D    C
ATOM  14087  N   ASP D 492      -1.485 -12.582   6.332  1.00  8.38      D    N
ATOM  14088  CA  ASP D 492      -0.069 -12.701   6.005  1.00  8.38      D    C
ATOM  14089  C   ASP D 492       0.811 -11.478   6.260  1.00 10.61      D    C
ATOM  14090  O   ASP D 492       1.936 -11.434   5.764  1.00 13.12      D    O
ATOM  14091  CB  ASP D 492       0.130 -13.278   4.576  1.00  9.43      D    C
ATOM  14092  CG  ASP D 492      -0.571 -12.549   3.461  1.00 12.39      D    C
ATOM  14093  OD1 ASP D 492      -1.763 -12.227   3.628  1.00 11.98      D    O
ATOM  14094  OD2 ASP D 492       0.072 -12.307   2.413  1.00 13.51      D    O
ATOM  14095  N   LEU D 493       0.391 -10.529   7.123  1.00  8.19      D    N
ATOM  14096  CA  LEU D 493       1.276  -9.412   7.498  1.00  8.27      D    C
ATOM  14097  C   LEU D 493       2.599 -10.015   8.047  1.00  8.06      D    C
ATOM  14098  O   LEU D 493       3.674  -9.762   7.511  1.00  8.07      D    O
ATOM  14099  CB  LEU D 493       0.638  -8.553   8.617  1.00  8.02      D    C
ATOM  14100  CG  LEU D 493       1.630  -7.784   9.573  1.00  8.64      D    C
ATOM  14101  CD1 LEU D 493       2.327  -6.607   8.854  1.00  7.94      D    C
ATOM  14102  CD2 LEU D 493       0.913  -7.277  10.837  1.00  7.84      D    C
ATOM  14103  N   GLN D 494       2.489 -10.839   9.088  1.00  8.03      D    N

Thank you for taking the time to look at my question.
I would also appreciate knowing if you plan to update that part or just ignore it and leave it as is.

The text was updated successfully, but these errors were encountered:

rwxayheee · 2024-12-18T06:03:00Z

Hi @junsuhas

For the first example:

RuntimeError: for residue_key='A:80', 3 have passed: ['HIE', 'HID', 'HIP'] and tied for fewest missing H: HIE HID

This is because the protonation or tautomeric state of the histidine residue is unspecified, and Meeko brings it to your attention as it doesn't want to make an assumption. This histidine can be HIE, HID or HIP (not just HIP) - and the decision must be made in the input structure. Meeko currently does not have an internal mechanism to enumerate or evaluate macromolecule protonation states. To assign and evaluate protonation states, popular choices are reduce and PDB2PQR. Both of them are current and can be incorporated in a Python environment and have command-line scripts and/or Python usage just like Meeko.
If you're interested:

In our Colab notebook examples and some tutorials we use reduce2.py as part of the workflow. From reduce2.py, you will get a processed PDB with a neutral HIS tautomer after optimization and evaluation.
In an ongoing develop effort, we are currently trying to interface PDB2PQR, by parsing the PQR file. From PDB2PQR, you may get a processed PDB or PQR with a neutral, or charged HIS after optimization and evaluation.

For the second example:

Error: Creation of data structure for receptor failed.

Details:
Template generation failed for unknown residues: {'GL3'}, which appear to be linking fragments. 
Generation of chemical templates with modified backbones, which involves guessing of linker positions and types, are not currently supported. 
Recommendations:
1. (to parameterize the residues) Use --add_templates to pass the additional templates with valid linker_labels, 
2. (to skip the residues) Use --delete_residues to ignore them. Residues will be deleted from the prepared receptor.

This is because GL3 present in the system as a linking fragment, but it doesn't have the standard protein- or nucleic-acid -like backbone. Meeko currently only automatically processes the nonstandard residues with standard backbone, to make sure the chemistry - atom and bond types at the end points - are predictable.
If you have these peculiar fragments, we encourage you to build the template following this template building guide in the documentation:
https://meeko.readthedocs.io/en/release/py_build_temp.html#example-usage

Please let us know what you think and if you have any further questions. Thanks!

junsuhas · 2024-12-18T06:11:36Z

Thanks for your response!

diogomart · 2024-12-18T19:20:39Z

The residue matching algorithm depends on whether or not the input residue has any hydrogens. For histidine, if there are zero input hydrogens, it will default to HIE. If there is any hydrogen, even bound to carbon as in the example, then the algorithm needs to identify one template that has the least number of missing hydrogens. Since there aren't any hydrogens on the sidechain (only on the backbone), HID and HIE tie for fewest missing Hs and the error is raised.

frgoe003 · 2025-02-09T08:36:27Z

Hey @rwxayheee @diogomart, I know this issue is already closed but I encounter the same for some structures. I noticed that even with --allow_bad_res it still raises the error. Is there a way to automatically delete residues in case this error is raised?

rwxayheee · 2025-02-10T21:52:51Z

Hi @frgoe003

I had thought about that too. I implemented the template generation but excluded the linking fragments for several reasons. One of them is that the auto-removal of linking fragments will affect the embedding state of another residue (to which the linking fragment is attached). Doing this can potentially set the system up with a different (?) residue template, like if the fragment is linked to a Cys or Lys.

There's currently no automatic way to delete linking fragments, and the intention was to urge users to pause and manually correct the structures. There are workarounds for automation processes, though, as we are allowing the deletion of linking fragments by --delete_residues. Here's what I might do:

1- extract residue ID of linking fragment from the logging
2- re-run the structure processing with --delete_residues flag
At this point, you will be able to pass the structure checking. Still, this will change the linking atom's type, so please consider the following:
3- for good practice, assign the linking atom in the present (not deleted) residue to be a --blunt_ends
(to be honest I'm not very sure if this will work with current templates; so if you have specific systems in your mind, I would like to try them myself)

The ideal situation would be to allow the incorporation of linking fragments as well as gracefully deleting residues with minimal disruption in structure. I've wanted to implement this, but it may require a different approach for end capping and template matching. Currently, we rely on pre-registration of protonation states and embedding forms of all residues, but that isn't feasible for large databases. Generating the template and checking on the linking patterns becomes sort of 'chicken-and-egg' problem, when a new pattern is encountered. With this, a pause for manual inspection is always expected.

junsuhas closed this as completed Dec 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Residue naming issue in protein preparation #291

Residue naming issue in protein preparation #291

junsuhas commented Dec 18, 2024 •

edited

Loading

rwxayheee commented Dec 18, 2024

junsuhas commented Dec 18, 2024

diogomart commented Dec 18, 2024

frgoe003 commented Feb 9, 2025 •

edited

Loading

rwxayheee commented Feb 10, 2025 •

edited

Loading

Residue naming issue in protein preparation #291

Residue naming issue in protein preparation #291

Comments

junsuhas commented Dec 18, 2024 • edited Loading

first

second

rwxayheee commented Dec 18, 2024

junsuhas commented Dec 18, 2024

diogomart commented Dec 18, 2024

frgoe003 commented Feb 9, 2025 • edited Loading

rwxayheee commented Feb 10, 2025 • edited Loading

junsuhas commented Dec 18, 2024 •

edited

Loading

frgoe003 commented Feb 9, 2025 •

edited

Loading

rwxayheee commented Feb 10, 2025 •

edited

Loading