This is an example frontera project, combining scrapy's quotes example and frontera's cluster-setup
First off open a bunch (7 should be enough) of terminal windows (tmux is awesome for this).
Run the following cmd to get hbase, kafka, and zookeeper running. It will also create the HBase namespace.
docker-compose -f docker-compose-backend.yml up -d
echo "waiting for hbase startup (30 seconds)"
sleep 30
echo "create_namespace 'crawler'" | docker exec -i docker_hbase /hbase/bin/hbase shell
In a separate terminal window run python -m frontera.worker.db --config frontier.dbworker --no-incoming
In another separate terminal window run python -m frontera.worker.db --no-batches --config frontier.dbworker
In a separate terminal windows run python -m frontera.worker.strategy --config frontier.sworker --partition-id 0
In another separate terminal window run python -m frontera.worker.strategy --config frontier.sworker --partition-id 1
In separate terminal windows run:
scrapy crawl quotes -L INFO -s SPIDER_PARTITION_ID=0
scrapy crawl quotes -L INFO -s SPIDER_PARTITION_ID=1