-
Notifications
You must be signed in to change notification settings - Fork 478
Usage with Solr
Mongo Connector stores metadata in each document to help handle rollbacks. To support these data, you'll need to add the following to your schema.xml:
<field name="_ts" type="long" indexed="true" stored="true" />
<field name="ns" type="string" indexed="true" stored="true"/>
Mongo Connector can replicate to the Solr search engine using the Solr DocManager. The most basic usage is the following:
mongo-connector -m localhost:27017 -t http://localhost:8983/solr -d solr_doc_manager
old usage (before 2.0 release):
mongo-connector -m localhost:27017 -t http://localhost:8983/solr -d <your-doc-manager-folder>/solr_doc_manager.py
This assumes there is a MongoDB replica set running on port 27017 and that Solr is running on port 8983 both on the local machine.
Please refer to the Apache documentation for configuring Solr and SolrCloud.
Mongo Connector automatically "flattens" MongoDB documents. Fields within sub-documents can be referenced by their "dot-separated path" within the document. Likewise, array fields are unrolled, so that individual elements are accessible by the field's original name, plus a ".", plus the index within the array that the element occupied. An example:
{
"subdoc": {
"a": 1,
"b": 2,
"c": 3,
"array": [
{"name": "elmo"},
{"name": "oscar"}
]
}
}
will become the following in Solr:
{
"subdoc.a": 1,
"subdoc.b": 2,
"subdoc.c": 3,
"subdoc.array.0.name": "elmo",
"subdoc.array.1.name": "oscar"
}
Additionally, Mongo Connector comes with an example schema.xml
file that can help get you started integrating MongoDB with Solr search. Solr reads schema.xml in order to find field types, fields that documents may have, the primary key, and more. Mongo Connector will try to obtain the schema for Solr using the LukeRequestHandler at a special URI admin/luke/?show=schema&wt=json
that is appended to the base Solr URL. So, in the example above, Mongo Connector will try to obtain the schema for Solr by sending a GET request to http://localhost:8983/solr/admin/luke/?show=schema&wt=json
.
Mongo Connector will drop fields from MongoDB documents that aren't declared in your Solr core's schema in order to avoid Solr throwing exceptions and failing to insert those documents. If you don't define the fields you want in schema.xml
and reload the Solr core, Mongo Connector will merrily continue stripping your MongoDB documents of the offending fields. You can check what Solr thinks the schema to your core is by visiting the aforementioned endpoint in your browser.
MongoDB generally uses a field called _id
to store unique keys in documents. Solr by default uses id
for the same purpose. In both databases, these fields have mandatory presence in a document, so submitting a document unchanged from MongoDB to Solr while the unique key is still id
will result in an exception from Solr, and the document will not be inserted. In order for Mongo Connector to replicate to Solr successfully, Solr needs to see the expected unique key in each document. There are two ways to do this:
-
Mongo Connector can translate
_id
toid
when operations are replicated to Solr if you specify the option--unique-key=id
to mongo-connector. The newid
field will hold a string-ified version of what was stored in the_id
field. -
You can switch Solr's unique key to
_id
instead ofid
. If you're working from the schema.xml provided as part of Mongo Connector, this is already done for you! Otherwise, you can accomplish this by editing theschema.xml
file and replacing the line:<uniqueKey>id</uniqueKey>
with the line:
<uniqueKey>_id</uniqueKey>
You'll also need to add a field definition for this key. Inside the
<fields></fields>
tags, you should insert:<field name="_id" type="string" indexed="true" stored="true" />
Finally, you'll need to reload your Solr core.
Mongo Connector does not force a commit on every write operation; rather, a Solr administrator should configure commit behavior in solrconfig.xml
. This generally increases overall performance, since not every operation has to be flushed to disk immediately.
Mongo Connector also provides the --auto-commit-interval
option to override any option set in solrconfig.xml
, though the former should be preferred if possible. This option takes as an argument a number which is to be the maximum number of seconds allowed before a write must be committed. An argument of 0
means that every write operation is committed immediately:
# commit every write immediately
mongo-connector --auto-commit-interval=0 -d solr_doc_manager -t http://localhost:8983/solr