-
Notifications
You must be signed in to change notification settings - Fork 478
Writing Your Own DocManager
This page details how to write your own DocManager
class, which allows you to replicate CRUD operations from MongoDB to another indexing system.
The first step to creating your own DocManager is to create a new Python module my_doc_manager.py
and define a class called DocManager
. All DocManagers inherit from DocManagerBase
:
# filename: my_doc_manager.py
from mongo_connector.doc_managers.doc_manager_base import DocManagerBase
class DocManager(DocManagerBase):
"""DocManager that connects to MyIndexingSystem"""
# methods will go here
There are a number of methods that need to be defined within this class. They are:
-
__init__(self, url, **kwargs)
This is the contructor and should be used to do any setup your client needs in order to communicate with the target system. The only required parameter is
url
, which is the endpoint that the DocManager should target. This is given by the-t
option on the command line, or in thetargetURL
field for this DocManager in Mongo Connector's config file.There are also a number of other standard parameters that your DocManager can take advantage of:
-
auto_commit_interval
is the time period, in seconds, between when the DocManager should attempt to commit any outstanding changes to the indexing system. A value ofNone
indicates that mongo-connector should not attempt to sync any changes automatically. -
unique_key
gives the unique key the DocManager should use in the target system for documents. The default is_id
, the same unique key used by MongoDB. -
chunk_size
specifies the maximum number of documents to be inserted in a batch.
If the user defines anything in the
args
section for this DocManager, they will be provided as keyword arguments to the constructor. Just document what additional arguments your DocManager may take, and the user can provide values to them in the config file. -
-
stop(self)
This method is called to stop the DocManager. If you started any threads to take care of auto commit, for example, this is the place to
join()
them. -
upsert(self, doc, namespace, timestamp)
This should upsert (i.e., insert or write over) the document provided in the
doc
parameter.doc
is the full document to be upserted.namespace
andtimestamp
are the namespace (i.e., database + '.' + collection) and Timestamp of the oplog record that caused this event. -
bulk_upsert(self, docs, namespace, timestamp)
This is used to insert documents in-bulk to the target system during a collection dump. This method is optional, and mongo-connector will fall back to calling
upsert
on each document serially if not provided. However, inserting documents in-bulk is a lot more efficient.namespace
andtimestamp
are the same as above. -
remove(self, document_id, namespace, timestamp)
This should remove the document with the id
document_id
from the external system.namespace
andtimestamp
are the same as above. -
search(self, start_ts, end_ts)
This should provide an iterator over all documents that were last modified between
start_ts
andend_ts
. Your DocManager implementation needs to take care of how to store this information. This method is called when a MongoDB rollback occurs. -
commit(self)
This method should commit any outstanding changes to the target system.
-
get_last_doc(self)
This should return the document most recently modified in the target system.
-
handle_command(self, doc, namespace, timestamp)
This optional method processes an arbitrary database command in the oplog.
doc
is the command document itself.namespace
is the original namespace the command was performed on, without any namespace mappings applied. This is contrary to how the other methods work! This is so that your DocManager has maximum flexibility in how to deal with the command.
It might be helpful to see an example implementation of a DocManager
. For this, we recommend taking a look at doc_manager_simulator.py
, which is used in the test suite to mock replicating CRUD operations.
You don't need to make a pull request to this project in order for users to be able to use your DocManager. You can distribute your DocManager on PyPI separately as its own package, and users can install it alongside Mongo Connector, referencing it on the command line like any of the built-in DocManagers. All you need to do is:
-
Create the project directory structure to mirror that of the built-in DocManagers like this:
project/mongo_connector/__init__.py project/mongo_connector/doc_managers/__init__.py project/mongo_connector/doc_managers/your_custom_doc_manager.py
-
Put the following in your
mongo_connector/__init__.py
andmongo_connector/doc_managers/__init__.py
:from pkgutil import extend_path __path__ = extend_path(__path__, __name__)
-
If you want to distribute this on PyPI, you'll probably want to add a README.rst and a setup.py to install your
mongo_connector
package.
That's it! Install your DocManager and test it with mongo-connector -d your_custom_doc_manager
. No ".py" at the end; Mongo Connector will find the correct file just from the module name.