v0.6.1

forlilab · Nov 18, 2024 · 07a38f8 · 07a38f8
2 parents 7880da1 + aa8d4df
commit 07a38f8
Show file tree

Hide file tree

Showing 33 changed files with 303 additions and 207 deletions.
diff --git a/MANIFEST.in b/MANIFEST.in
@@ -1,7 +1,15 @@
 include README.md
 include LICENSE
 include test/*
+include test/*
+include test/flexibility_data/*
+include test/macrocycle_data/*
+include test/polymer_data/*
+include test/rdkitmol_from_docking_data/*
+include test/small_cycle_data/*
 include example/*
+include example/tutorial1/*
 include meeko/data/*
+include meeko/data/params/*
 include meeko/tmp/*
-include meeko/utils/*
+include meeko/utils/*
diff --git a/README.md b/README.md
@@ -1,8 +1,8 @@
 # Meeko: interface for AutoDock
 
 [![API stability](https://img.shields.io/badge/stable%20API-no-orange)](https://shields.io/)
-[![PyPI version fury.io](https://img.shields.io/badge/version-0.6.0-green.svg)](https://pypi.python.org/pypi/meeko/)
-[![Documentation Status](https://readthedocs.org/projects/meeko/badge/?version=readthedocs)](https://meeko.readthedocs.io/en/readthedocs/?badge=readthedocs)
+[![PyPI version fury.io](https://img.shields.io/badge/version-0.6.1-green.svg)](https://pypi.python.org/pypi/meeko/)
+[![Documentation Status](https://readthedocs.org/projects/meeko/badge/?version=release)](https://meeko.readthedocs.io/en/release/?badge=release)
 
 Meeko prepares the input for AutoDock and processes its output.
 It is developed alongside AutoDock-GPU and AutoDock-Vina.
@@ -16,7 +16,7 @@ at [Scripps Research](https://www.scripps.edu/).
 
 ## Documentation
 
-The docs are hosted on [meeko.readthedocs.io](meeko.readthedocs.io)
+The docs are hosted on [meeko.readthedocs.io](https://meeko.readthedocs.io/en/release)
 
 
 ## Reporting bugs
@@ -42,7 +42,7 @@ pip install meeko
 
 Meeko exposes a Python API to enable scripting. Here we share very minimal examples
 using the command line scripts just to give context.
-Please visit the [meeko.readthedocs.io](meeko.readthedocs.io) for more information.
+Please visit the [meeko.readthedocs.io](https://meeko.readthedocs.io/en/release) for more information.
 
 Parameterizing a ligand and writing a PDBQT file:
 ```bash

diff --git a/docs/requirements.txt b/docs/requirements.txt
@@ -1 +1,2 @@
 sphinx-book-theme
+sphinx-design
diff --git a/docs/source/cli_rec_prep.rst b/docs/source/cli_rec_prep.rst
@@ -23,7 +23,7 @@ Write flags
 ~~~~~~~~~~~
 
 The option flags starting with ``--write`` in  ``mk_prepare_receptor`` can
-be used both with an argument to specify the outpuf filename: 
+be used both with an argument to specify the output filename: 
 
 .. code-block:: bash
 
@@ -72,7 +72,7 @@ The arguments involving assignment of residues to properties:
 
     -s, --reactive_name_specific <residue:atom>
 
-use the residue selection lanaguge described above, followed by an equal sign (``=``) as the delimiter and the assigned value, which could be the name of a residue template, the atom index for the blunt end, the wanted altloc ID, or the atom name of the reactive atom. Each residue selection is comibned with the most recent assignment that precedes it, resulting in a further expanded list of residue-assignment pairs. 
+use the residue selection lanaguge described above, followed by an equal sign (``=``) as the delimiter and the assigned value, which could be the name of a residue template, the atom index for the blunt end, the wanted altloc ID, or the atom name of the reactive atom. Each residue selection is combined with the most recent assignment that precedes it, resulting in a further expanded list of residue-assignment pairs. 
 
 For an input like ``"A:5,7=CYX,A:19A,B:17=HID``, this assignment language represents: ``residues (number) 5 in Chain A are set to (template name) CYX`` and ``residue (number) 19 A in Chain A, and residue (number) 17 in Chain B are set to (template name) HID``. 
 

diff --git a/docs/source/colab_examples.rst b/docs/source/colab_examples.rst
@@ -69,14 +69,14 @@ The basic docking example is developed to showcase the usage of **import additio
 
 `Run on Colab! <https://colab.research.google.com/drive/1tzQoguVQDCguOaLSsGvQuL57ry_PY3UG?usp=sharing>`_
 
-The reactive docking example is based on reactive docking method that has been developed for high-throughput virtual screenings of reactive species. In this example, a small molecule substrate (Adenosine monophosphate, PDB token AMP) is targeting at the catalytic histidine residue of a hollow protein structure of bacteria RNA 3' cyclase (PDB token 3KGD) to generate the near-attack conformation for the formation of the phosphoamide bond. A docked pose that closely resembles the original position of the ligand is expected among the top-ranked poses. 
+The reactive docking example is based on reactive docking method that has been developed for high-throughput virtual screenings of reactive species. In this example, a small molecule substrate (Adenosine monophosphate, PDB token AMP) is targeting the catalytic histidine residue of a hollow protein structure of bacteria RNA 3' cyclase (PDB token 3KGD) to generate the near-attack conformation for the formation of the phosphoamide bond. A docked pose that closely resembles the original position of the ligand is expected among the top-ranked poses. 
 
 
 [AutoDock-GPU] Tethered Docking
 ---------------
 
 `Run on Colab! <https://colab.research.google.com/drive/1tf9xOgn6u8eDTeFJtc8GCEGRX-8aR9Bo?usp=sharing>`_
 
-The covalent docking example is based on the two-point attractor and flexible side chain method. In this example, a small molecule substrate (Adenosine monophosphate, PDB token AMP) is attached onto the catalytic histidine residue of a hollow protein structure of bacteria RNA 3' cyclase (PDB token 3KGD) to reproduce the covalent intermediate complex structure. A docked pose that closely resembles the original position of the ligand is expected among the top-ranked poses. 
+The covalent docking example is based on the two-point attractor and flexible side chain method. In this example, a small molecule substrate (Adenosine monophosphate, PDB token AMP) is attached to the catalytic histidine residue of a hollow protein structure of bacteria RNA 3' cyclase (PDB token 3KGD) to reproduce the covalent intermediate complex structure. A docked pose that closely resembles the original position of the ligand is expected among the top-ranked poses. 
 
 
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -22,6 +22,7 @@
     'sphinx.ext.autodoc',
     'sphinx.ext.napoleon',
     'sphinx.ext.intersphinx',
+    'sphinx_design',
 ]
 
 html_logo = "images/raccoon.png"

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -33,17 +33,35 @@ convert docking output files  ``mk_export.py``.
 Running a docking
 -----------------
 
-To run a docking, more packages are required besides Meeko:
+To run a docking, more packages are required besides Meeko. At a minimum,
+either Vina or AutoDock-GPU are needed to run the actual docking.
+
+.. grid:: 3
+
+    .. grid-item-card:: AutoDock-GPU
+        :link: https://github.com/ccsb-scripps/AutoDock-GPU
+
+        Docking for GPUs. Implements the AutoDock4.2 scoring function.
+        Command line executable only.
+
+    .. grid-item-card:: AutoDock-Vina
+        :link: https://autodock-vina.readthedocs.io/
+
+        Docking on CPUs.
+        Implements Vina and AutoDock4.2 scoring functions.
+        Has a Python API and command line executable.
+
+    .. grid-item-card:: Ringtail
+        :link: https://github.com/forlilab/ringtail
+
+        Store and analyze virtual screening with SQLite.
+        Has a Python API and command line scripts.
 
- * AutoDock-Vina
- * AutoDock-GPU
- * Ringtail
 
 Check the tutorials page to learn about using meeko with these other packages
 to run molecular docking and virtual screening.
 
 
-
 .. toctree::
    :maxdepth: 3
    :hidden:

diff --git a/docs/source/lig_overview.rst b/docs/source/lig_overview.rst
@@ -14,28 +14,30 @@ Use of RDKit
 Meeko takes as input an RDKit molecule that has 3D positions and
 all hydrogens as real atoms, and creates an object called MoleculeSetup
 that stores all the parameters, such as atom types, partial charges, or
-rotatable bonds. Adding hydrogens and 3D positions is not performed by Meeko.
+rotatable bonds. RDKit checks the valence of atoms, making it difficult
+to have molecules with incorrect bond orders or formal charges.
+Adding hydrogens and 3D positions is not performed by Meeko.
 
 Parameterization using SMARTS patterns
 --------------------------------------
 
 Many of the parameters are assigned using SMARTS strings, which are a compact
-and versatile query language for identigying chemical substructures. This makes
+and versatile query language for identifying chemical substructures. This makes
 it easier to define custom atom types and other parameters. The SMARTS
 that define the AutoDock parameters are included in Meeko and set as the
-default, This goal of this feature is to facilitate the implementation of
+default, The goal of this feature is to facilitate the implementation of
 custom docking workflows.
 
 The MoleculePreparation class stores all the configuration required to
 parameterize a ligand, and contains the methods that take an RDKit molecule
 as input, and return a parameterized MoleculeSetup. The MoleculePreparation
-offers several option to control how molecules are parameterized, and many of
-which are exposed as command line options in ``mk_prepare_ligand.py``.
+offers several options to control how molecules are parameterized, and most
+are exposed as command line options in ``mk_prepare_ligand.py``.
 
 Preparing the input for docking
 -------------------------------
 
 AutoDock-GPU and Vina use the PDBQT format for input molecules.
 PDBQT files, or strings in Python, are produced from a parameterized instance
-of MoleculeSetup. The methods for that live in the PDBQTWriterLegacy class.
+of MoleculeSetup. The methods to write PDBQT are in the PDBQTWriterLegacy class.
 PDBQT strings can be passed directly to Vina using its Python API.
diff --git a/docs/source/lig_prep_advanced.rst b/docs/source/lig_prep_advanced.rst
@@ -150,7 +150,7 @@ rotatable, which often leads to unreasonable geometries, but is necessary to
 visit both amide rotamers during docking.
 
 Here, we configure Meeko to make single bonds in some conjugated systems rigid,
-as defined byt the SMARTS ``"C=CC=C"``, and rigidify all amide bonds matched
+as defined by the SMARTS ``"C=CC=C"``, and rigidify all amide bonds matched
 by ``"[CX3](=O)[NX3]"``, which includes tertiary amides but not thioamides or
 amidines:
 

diff --git a/docs/source/lig_prep_basic.rst b/docs/source/lig_prep_basic.rst
@@ -26,7 +26,7 @@ Writing a single PDBQT file:
 
 If the ``-o`` option is omitted, the output filename will be the same as the
 input but with ``.pdbqt`` extension. Option ``-`` prints the PDBQT
-string to standard output instead of writting to a file.
+string to standard output instead of writing to a file.
 
 Writing multiple PDBQT files from an SD file with multiple molecules:
 
@@ -67,6 +67,6 @@ AutoDock-Vina, or passed directly to Vina within Python using Vina's Python API,
 and avoiding writing PDBQT files to the filesystem.
 
 Note that calling ``mk_prep`` returns a list of molecule setups.
-As of v0.6.0, this list constains only one element  unless ``mk_prep`` is
+As of v0.6.0, this list contains only one element unless ``mk_prep`` is
 configured for reactive docking, which is not the case in this example. This is
 why we are considering the first (and only) molecule setup in the list.
diff --git a/docs/source/py_build_temp.rst b/docs/source/py_build_temp.rst
@@ -5,9 +5,9 @@ Advanced information about templates
 
 The interpretation of the valence (bonds) and formal charge of atoms is an essential step when parsing a PDB/CIF file, and the accuracy of residue mapping is crucial to the creation of a macrobiomolecule system. In Meeko, the input residue names are used as keys and the chemical templates are retrieved accordingly based on :ref:`templates <templates>`. 
 
-For the command line script for receptor preparatoin, ``mk_prepare_receptor.py``, there are three major ways of obtaining such templates: 
+For the command line script for receptor preparation, ``mk_prepare_receptor.py``, there are three major ways of obtaining such templates: 
 
-**(1) Loading from the default JSON file:** ``Meeko/meeko/data/residue_chem_templates.json``
+**(1) Load from the default JSON file:** ``Meeko/meeko/data/residue_chem_templates.json``
 
 This is the default residue template set curated by us, including: 
 
@@ -30,22 +30,22 @@ This is the default residue template set curated by us, including:
     ├── parmBSC1.lib
     └── all_modrna08.lib
 
-(c) residues or ligands in CCD (Chemical Component Dictionary) that have conflicting names with the above residues. 
+(c) the ligands in the CCD (Chemical Component Dictionary) that have conflicting names with the above residues.
 
-**(2) Loading by ``--add_template`` from an additional JSON file:** (example) ``Meeko/meeko/data/NAKB_templates.json``
+**(2) Load from an additional JSON file** by argument ``--add_templates``, e.g., ``--add_templates Meeko/meeko/data/NAKB_templates.json``
 
 This is an optional add-on template set generated by us, based on the curated set of modified nucleotides by Nucleic Acid Knowledgebase (NAKB). 
 
-**(3) Fetching from PDB by CCD name on the run**
+**(3) Fetch from PDB by CCD name during runtime**
 
 When an unknown residue is encountered, ``mk_prepare_receptor.py`` attempts to resolve its chemical identity by fetching a definition CIF file from PDB (Protein Data Bank) and generates chemical templates of all possible embedding forms of it when there are inter-residue bonds. Currently, this is an automated yet relatively simple process that only supports noncovalent ligands and residues with unmodified backbones. 
 
-Here, we present a quick guide of building potentially complicated residue templates on your own using the ``meeko.chemtempgen`` submodule. In this example, we will be working with residue ``CRO``, a naturally occuring fluorophore in green fluorescent proteins formed by condensation of three consecutive residues Ser-Tyr-Gly. 
+Here, we present a quick guide to building potentially complicated residue templates using the ``meeko.chemtempgen`` submodule. In this example, we will prepare a residue with modified backbone: ``CRO``, a naturally occurring fluorophore in green fluorescent proteins, formed by condensation of three consecutive residues Ser-Tyr-Gly. 
 
 Example usage
 -------------
 
-Before we start, we will import the required modules and optionally, suppress excess rdkit loggings that may occur during the editing of molecular structures. Then we will create a ``ChemicalComponent`` from a definition CIF file, which will be obtained by ``fetch_from_pdb`` (Internet connection is required). 
+Before we start, we will import the required modules and, optionally, suppress excess rdkit loggings that may occur during the editing of molecular structures. Then, we will create a ``ChemicalComponent`` from a definition CIF file, which will be obtained by ``fetch_from_pdb`` (Internet connection is required). 
 
 .. code-block:: python
 
@@ -92,7 +92,7 @@ The created ``ChemicalComponent`` object, ``CRO_from_cif``, has a corresponding
    :width: 60%
    :align: center
 
-As we may see from the picture above, in order to forge ``CRO`` into a linking embedded fragment in a protein, some atoms need to be removed. In this example, we will simply do so by specifying the atom names. ``make_embedded`` calls function ``embed`` on the duplicated object ``cc``, which takes ``embed_allowed_smarts`` as the editable zone and removes atoms matching the names in ``leaving_names``. Here, the ``embed_allowed_smarts`` is chosen to be the SMARTS of altered backbone in residue ``CRO``. Note that by default, ``embed`` removes associated hydrogens for convenience. Therefore, in this case, ``leaving_names = {"H2", "OXT"}`` removes atoms ``H2``, ``OXT`` as well as the bonded hydrogen, ``HXT``. The same task could be alternatively done by the equivalent SMARTS pattern. 
+As we may see from the picture above, in order to forge ``CRO`` into a linking embedded fragment in a protein, some atoms need to be removed. In this example, we will simply do so by specifying the atom names. ``make_embedded`` calls function ``embed`` on the duplicated object ``cc``, which takes ``embed_allowed_smarts`` as the editable zone and removes atoms matching the names in ``leaving_names``. Here, the ``embed_allowed_smarts`` is chosen to be the SMARTS of altered backbone in residue ``CRO``. Note that by default, ``embed`` removes associated hydrogens for convenience. Therefore, in this case, ``leaving_names = {"H2", "OXT"}`` removes atoms ``H2``, ``OXT`` as well as the bonded hydrogen, ``HXT``. The equivalent SMARTS pattern could alternatively do the same task. 
 
 .. code-block:: python
 
@@ -108,7 +108,7 @@ As we may see from the picture above, in order to forge ``CRO`` into a linking e
    :width: 60%
    :align: center
 
-Looking at the structure of the edited picture, we will see that the unneccessary atoms have gone and the hydrogens at the broken (blunt) ends become implict, which is exactly needed to generate the Smiles string for the chemical template. Function ``make_pretty_smiles`` makes the Smiles string with all Hs explicit for the template's RDKit molecule. Last but not least, we will determin the ``link_labels`` which specifies how ``CRO`` should be connected to other residues. Here, we will use the pattern from a built-in recipe, ``AA_recipe.pattern_to_label_mapping_standard``, which also applies to all other standard amino acid residues: ``{'[NX3h1]': 'N-term', '[CX3h1]': 'C-term'}``. Opionally, we can run a ``ResidueTemplate_check`` to see potential problems with the generated template. 
+Looking at the structure of the edited picture, we will see that the unnecessary atoms have gone and the hydrogens at the broken (blunt) ends become implicit, which is exactly needed to generate the Smiles string for the chemical template. Function ``make_pretty_smiles`` makes the Smiles string with all Hs explicit for the template's RDKit molecule. Last but not least, we will determine the ``link_labels`` which specifies how ``CRO`` should be connected to other residues. Here, we will use the pattern from a built-in recipe, ``AA_recipe.pattern_to_label_mapping_standard``, which also applies to all other standard amino acid residues: ``{'[NX3h1]': 'N-term', '[CX3h1]': 'C-term'}``. Optionally, we can run a ``ResidueTemplate_check`` to see potential problems with the generated template. 
 
 .. code-block:: python
 
@@ -165,7 +165,7 @@ To make the N-terminal embedding variant of ``CRO``:
     cc_N.resname += "_N"
     export_chem_templates_to_json([cc_N])
 
-In the chained procedure above, we have removed ``OXT`` and protonated ``N1``, which is done by ``make_capped`` that adds hydrogen(s) to matching atom(s) with specified ``capping_names`` within the region of ``allowed_smarts``. The expected outout from ``export_chem_templates_to_json`` is: 
+In the chained procedure above, we have removed ``OXT`` and protonated ``N1``, which is done by ``make_capped`` that adds hydrogen(s) to matching atom(s) with specified ``capping_names`` within the region of ``allowed_smarts``. The expected output from ``export_chem_templates_to_json`` is: 
 
 .. code-block:: bash
 

diff --git a/docs/source/rec_overview.rst b/docs/source/rec_overview.rst
@@ -5,9 +5,9 @@ Polymers and monomers
 ---------------------
 
 Receptors are represented as a collection of subunits where each subunit is
-treated as an individual molecule. THe same code path for ligand parameterization
-is also used herein. The Python class for receptors is called
+treated as an individual molecule. The Python class for receptors is called
 ``Polymer`` and the class for subunits is ``Monomer``.
+The code for ligand parameterization is also used for the receptor.
 
 During most of the v0.6.0 development, the name of the ``Polymer`` class was
 ``LinkedRDKitChorizo``. Many of the commit messages, as well as GitHub issues and pull