Skip to content

Running your own copy

Rob Speer edited this page Mar 28, 2014 · 29 revisions

The ConceptNet 5 server comes in three pieces: the main index in Apache Solr, a REST API that's served from Python, and a Web interface on top of that API.

Getting the Solr search data

You'll need a Solr representation of the data in ConceptNet. You can either get it by running the build process on your computer, or by downloading the built Solr data from http://conceptnet5.media.mit.edu/downloads/current/. (Look for the filename with "solr" in it.)

Setting up the index

Our Solr environment is packaged up at:

http://conceptnet5.media.mit.edu/downloads/20120501/conceptnet5-solr-config.tar.gz

(That has an old version number in the URL, but the Solr configuration hasn't changed.)

You should be able to unpack that and run "java -jar start.jar" to get a server, and then use the included "import-solr-json.sh" to load the ConceptNet 5 data that you download separately from:

http://conceptnet5.media.mit.edu/downloads/current/

NOTE: This may give you an index that doesn't fit in memory and spends an unreasonable amount of time swapping to disk. The machines we run it on are dedicated servers with 64 GB of RAM. Before that, we ran it in two shards, each on Amazon EC2 m1.large instances with 17 GB of RAM. (This was expensive and not recommended.)

Running the API server

This is where you need the conceptnet5 Python code, so begin by checking out the Git repository at https://github.com/commonsense/conceptnet5.

The REST API is in conceptnet5/conceptnet_api.wsgi. This can be run using a Python WSGI server or Apache's mod_wsgi. An example of running it in Gunicorn is included in gunicorn.sh.

Running the Web interface

This is also a WSGI file, in conceptnet5/web_interface/conceptnet_web.wsgi. You run it the same way as the API server, just pointing it to that .wsgi file instead.

Clone this wiki locally