Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polymer from PQR #285

Open
wants to merge 9 commits into
base: develop
Choose a base branch
from
Open

Conversation

rwxayheee
Copy link
Contributor

@rwxayheee rwxayheee commented Dec 15, 2024

This is for #264. PQR is a file format similar to PDB and contains Q (charge) and R (radius). The major programs that uses PQR are PDB2PQR and APBS. Being able to parse PQR would allow us to directly get the computed charges for Polymers with available models in PDB2PQR (PARSE, AMBER, etc.).

This PR is adapted from the PQR parser in PDB2PQR and the existing (RDKit-based) method in Meeko to create Polymer from PDB. Specifically, the partial charge will be first parsed into atoms in a monomer's raw_rdkit_mol. Next, to transfer it to the monomer's padded_mol and molsetup, a get_atomprop_from_raw option is added to function monomer.parameterize. This needs to be a dict, with keys of atom property name and values of corresponding default values. This additional function may be re-used to pass on other properties. Radius from PQR is also parsed, but currently not transferred to raw_rdkit_mol, padded_mol or molsetup.

If more people find it useful, we can add option --read_pqr, and expose option --charge_model to mk_prepare_receptor.py. Might need #277 to merge first before further edits on the command line scripts.

@rwxayheee rwxayheee changed the base branch from release to develop December 15, 2024 04:58
@rwxayheee rwxayheee changed the title Polymer from PQR; mk_prepare_receptor.py --read_pqr --charge_model Polymer from PQR Dec 15, 2024
@rwxayheee rwxayheee marked this pull request as ready for review December 15, 2024 06:39
@rwxayheee rwxayheee requested a review from diogomart December 15, 2024 06:39
@rwxayheee
Copy link
Contributor Author

Python Usage at 29b1464

Input:
1FAS_dry.pqr.txt

from meeko import Polymer
from meeko import ResidueChemTemplates
from meeko import MoleculePreparation
from meeko import PDBQTWriterLegacy

pqr_fn = "1FAS_dry.pqr"
templates = ResidueChemTemplates.create_from_defaults()
with open(pqr_fn, "r") as f:
    pqr_string = f.read()

mk_prep = MoleculePreparation(charge_model="read", charge_atom_prop="PQRCharge")
pol1 = Polymer.from_pqr_string(pqr_string, templates, mk_prep)

pdbqt_name1 = "1FAS_charge_from_pqr.pdbqt"
pdbqt_rigid = PDBQTWriterLegacy.write_from_polymer(pol1)[0]
with open(pdbqt_name1, "w") as f:
    f.write(pdbqt_rigid)

mk_prep = MoleculePreparation()
pol2 = Polymer.from_pqr_string(pqr_string, templates, mk_prep)

pdbqt_name2 = "1FAS_charge_default.pdbqt"
pdbqt_rigid = PDBQTWriterLegacy.write_from_polymer(pol2)[0]
with open(pdbqt_name2, "w") as f:
    f.write(pdbqt_rigid)

Output:
1FAS_charge_from_pqr.pdbqt.txt
1FAS_charge_default.pdbqt.txt

@rwxayheee rwxayheee marked this pull request as draft December 16, 2024 23:12
@rwxayheee
Copy link
Contributor Author

rwxayheee commented Jan 9, 2025

This comment is outdated. Please see updated version here: #285 (comment)

@rwxayheee rwxayheee marked this pull request as ready for review January 9, 2025 02:57
@rwxayheee rwxayheee changed the base branch from develop to release January 15, 2025 22:39
@rwxayheee rwxayheee changed the base branch from release to develop January 15, 2025 22:39
@rwxayheee
Copy link
Contributor Author

rwxayheee commented Jan 15, 2025

To do:

  • add test for PQR -> PDBQT

@rwxayheee rwxayheee marked this pull request as draft January 15, 2025 23:35
@rwxayheee rwxayheee marked this pull request as ready for review January 16, 2025 03:12
@rwxayheee
Copy link
Contributor Author

rwxayheee commented Jan 16, 2025

Added a test for PQR to PDBQT.
Not planning to add more modifications to handle situations like #310 in this PR. That issue is more about handling of single-atom residues, and the respective fixes should probably go to PR #259, which has fixes for template generation of single-atom residues.

@diogomart
Copy link
Contributor

The API looks great, but in mk_prepare_receptor.py, the option --read_pqr, --charge_model, and --mk_config aren't playing nicely with each other:

  1. --mk_config is ignored if using --read_pqr when instantiating mk_prep = MoleculePreparation(...
  2. the help message for --charge_model is missing the read option
  3. in mk_prepare_ligand.py, the charge model isn't automatically set to read when reading MOL2 files. Thus, we probably shouldn't do it here either for --read_pqr. However, if we decide to automatically set it to read when reading PQR, then it is OK to do it only if charge_model isn't specified in either --mk_config or in --charge_model. For example, if I set charge_model to espaloma in a file passed to --mk_config and use --read_pqr, then I expect the charges to be espaloma, not those from PQR.
  4. --charge_model must override the charge_model option in --mk_config, but leave the other options untouched. As it is, passing --charge_model destroys all configuration from --mk_config.
  5. the easiest is to remove --charge_model from command line and have the user write it in the --mk_config file.

@rwxayheee
Copy link
Contributor Author

rwxayheee commented Jan 27, 2025

For example, if I set charge_model to espaloma in a file passed to --mk_config and use --read_pqr, then I expect the charges to be espaloma, not those from PQR.

I let read_pqr override mk_config (same way other arguments override config inputs) because I considered read_pqr more of a combination of arguments (like a certain config), not just input option. See the help message:

reads PQR with assigned partial charges, and does not use ProDy

I did not want to expose charge_model read to command line. read_pqr is currently the only way to set charge_model to read, and read from a specific atom property, for a polymer from command line.

But now I understand the concerns you have. I'll think a bit more about this. Thanks again for the review!

@rwxayheee
Copy link
Contributor Author

For example, if I set charge_model to espaloma in a file passed to --mk_config and use --read_pqr, then I expect the charges to be espaloma, not those from PQR.

When I designed the function I did not want to support this. PQR file has some limitations (no element column, no alt locations, etc.), and the original PDB is always preferred.

I tried to explain this in the help message, maybe it's not clear. There was a message that states the --charge_model and other options will be ignored:

            mk_prep = MoleculePreparation(charge_model="read", charge_atom_prop="PQRCharge")
            print("Reading structures and partial charges from PQR. ") 
            print("Options --charge_model, --default_altloc and --wanted_altloc will be ignored. ")

--charge_model must override the charge_model option in --mk_config, but leave the other options untouched. As it is, passing --charge_model destroys all configuration from --mk_config.

The changes in this PR shouldn't override other config options. If so, I didn't intend to. I will do a more thorough check, since it's from a while ago

messages when --read_pqr is used,
corrections for MoleculePreparation from mk_config
@rwxayheee
Copy link
Contributor Author

rwxayheee commented Jan 28, 2025

962767a has the following changes:

  • Argument --charge_model
    config_group.add_argument(
        "--charge_model",
        choices=("gasteiger", "espaloma", "zero", "read"),
        help="default is gasteiger, 'zero' sets all zeros, 'read' requires --read_pqr",
        default=None,
    )

read is now exposed, and hereby explicitly doing --charge_model read is required if charges from PQR are to be read. Otherwise, the charge model always defaults to gasteiger.
Programmatically, the default value is None here for --charge_model. But if this is None (mean unspecified from command line input), the default of MoleculePreparation() will eventually be gasteiger or an assigned value from config, if provided.

  • Ways to initialize mk_prep
    # read mk_config if provided
    if args.mk_config is not None:
        with open(args.mk_config) as f:
            mk_config = json.load(f)
    else:
        mk_config = {}
    
    # update config by inputs from arguments
    if args.charge_model is not None: 
        mk_config["charge_model"]=args.charge_model
        if mk_config["charge_model"]=="read": 
            if args.read_pqr is None:
                print("Error: --charge_model read requires --read_pqr")
                sys.exit(2)
            mk_config["charge_atom_prop"]="PQRCharge"

    # initialize MoleculePreparation with config
    mk_prep = MoleculePreparation.from_config(mk_config)

--charge_model is the only argument in this command line script to possibly override mk_config. When it's not None, it overrides mk_config. When it's read, the only meaningful setting now is mk_prep.charge_atom_prop="PQRCharge". If we're parsing other types of macromolecule input, this code block can be further expanded.

for charge_model = espaloma that must present during initialization
@rwxayheee
Copy link
Contributor Author

rwxayheee commented Jan 28, 2025

Please see an updated version below, for a display of help message and usage:

Description in help message

  --read_pdb PDB_FILENAME
                        reads PDB, not PDBQT, and does not use ProDy
  --read_pqr PQR_FILENAME
                        reads PQR and does not use ProDy
.
.
.
  --charge_model {gasteiger,espaloma,zero,read}
                        default is gasteiger, 'zero' sets all zeros, 'read' requires --read_pqr

Example usage 1: Parsing PQR, and reads charges from PQR

[input]
1FAS_dry.pqr.txt

mk_prepare_receptor.py --read_pqr 1FAS_dry.pqr --charge_model read -o 1FAS_pqr -p
Reading a PQR file. The following options or configurations will be ignored: 
  - default_altloc
  - wanted_altloc
Reading structures and partial charges from PQR. 

Files written:
1FAS_pqr.pdbqt <-- static (i.e., rigid) receptor input file

[output]
1FAS_pqr.pdbqt.txt

Example usage 2: Parsing PQR, but reads structure only

[input]
1FAS_dry.pqr.txt

mk_prepare_receptor.py --read_pqr 1FAS_dry.pqr -o 1FAS_pqr_default -p 
Reading a PQR file. The following options or configurations will be ignored: 
  - default_altloc
  - wanted_altloc
Only reading structures from PQR. 
Charge model of choice: gasteiger

Files written:
1FAS_pqr_default.pdbqt <-- static (i.e., rigid) receptor input file

[output]
1FAS_pqr_default.pdbqt.txt

Example usage 3: Using alternate charge model

[input]
3ZD5.pdb.txt

mk_prepare_receptor.py --charge_model espaloma -i 3ZD5.pdb -o 3ZD5_espaloma -p
mk_prepare_receptor.py --charge_model zero -i 3ZD5.pdb -o 3ZD5_zero -p

[output]
3ZD5_espaloma.pdbqt.txt
3ZD5_zero.pdbqt.txt

@rwxayheee
Copy link
Contributor Author

rwxayheee commented Jan 28, 2025

@diogomart Would 962767a be able to address the issues you pointed out in the previous comment (in #285 (comment))?

Sorry this is not the first time I got very confused by how config works and what needs to be set before initializing MoleculePreparation. We discussed this when I was working on mk_prepare_ligand to parse charges from MOL2, but I'm unfamiliar with the config options. I just totally forgot that there are other settings of MoleculePreparation not included in this command line script.

@diogomart
Copy link
Contributor

No worries. Yes, those issues are addressed, thanks. About the charges: the first residue is -NH3+ terminated and thus its net charge is +1 in the test PQR, however it's being made blunt because #301 isn't merged yet and the charge redistribution is modifying the charges resulting in a net charge of zero. We may want to turn off the charge redistribution for charge_model=read in

charges = rectify_charges(charges, net_charge, decimals=3)

@rwxayheee
Copy link
Contributor Author

Hi @diogomart
Yes, I can update the test if #301 is merged first, or pull #301 into this PR and fix the test. Let me know which way you think is better.
Could you explain why rectify_charges can be skipped for charge_model=read? It's needed if ignored atoms have nonzero charges, and if atoms (extra hydrogens) are added after input and before parameterization, such as by template assignment.

@diogomart
Copy link
Contributor

diogomart commented Jan 28, 2025

rectify_charges modifies partial charges to make their sum equal to the net formal charge of the molecule, and to make sure partial charges sum to an integer (not e.g. 0.002) when written with a given number of decimals precision, such as 3 decimals in PDBQT format. When Gasteiger or Espaloma charges are assigned, they are assigned to the padded molecule, and rectify_charges makes sure the partial charges of actual (non-padding) atoms sum to the net formal charge of the actual molecule.

On a second thought, we can keep rectify_charges. However, we should make the template matching more strict: no missing or excess atoms would be allowed, including missing or excess Hs. Consider this example: PQR has charges for HID, and user sets template to HIE, charges would be incorrect.

@rwxayheee
Copy link
Contributor Author

Hi @diogomart
ok, would it be a good idea for me to pull #301 into this?

However, we should make the template matching more strict: no missing or excess atoms would be allowed, including missing or excess Hs.

Previously stricter matching was difficult with CYS, which can be blunt, cross-linked or deprotonated. I can do some more rewrite of the template matching. But if you have a specific plan of how to do this, please feel free to take over

@rwxayheee
Copy link
Contributor Author

rwxayheee commented Jan 28, 2025

Consider this example: PQR has charged for HID, and the user sets the template to HIE; charges would be incorrect.

It might be because PQR isn't just a structure input. It's more like a partially parameterized polymer. I feel like if we start to support polymer json as input (I did in #277 but I gave up), there will be a similar problem: whether we want to allow updating single residue's or a few atoms' properties on a parameterized polymer, and how to verify the validity of doing that.

I gave up on #277 because of the same reason. I couldn't think of a way to reasonably re-parameterize a part of a parameterized polymer.

Programmatically, to make the net charges match, rectify_charges is capable of re-parameterizing a very small part of it, if any further edition/template assignment has taken place before parameterize. But you're right that they are not chemically correct.

In the command line script where there's a streamlined preparation (parameterize is called only once), we can set a combination of flags to forbid template assignment when PQR is the input. Yes, making template matching stricter would be helpful and neccessary.

In API, or in the future when integrated with a builder, there isn't a good way to prevent this usage:
input (PQR or parameterized polymer) -> edit/assignment/make exceptions, etc. -> parameterize

When there's an incoming assignment or edit, the monomer needs to be re-parameterized. I understand that if the charge model is read and there's no internal mechanism to compute with the same charge model, then the updated charges after the edit can become invalid or inconsistent with the original charge model.
It seems a bit difficult to keep track of the source of the charges :'( I don't have a very good plan of how to do this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants