Usage:
- get a server with Ubuntu 20.04, 16Gb memory, and 8 CPUs.
- ssh into it
- Run
tmux new
so that you can disconnect and come back later withtmux ls ; tmux attach
- Run:
curl -o- https://raw.githubusercontent.com/ambanum/TOSBack-CGUs-bridge/master/prepare.sh | bash
- Wait for about 10 minutes
- Add the ssh key to the github account of https://github.com/TosbackCgusBridge-Bot
- While still in the
tmux
session, run:
cd TOSBack-CGUs-bridge
sh ./prepare2.sh
export DATABASE_URL=...
sh ./run.sh
- You can now disconnect from your
tmux
session and come back several hours later (note to self: started 15:30) - Check out the
import-123456789
andrebased-123456789
branches for:
This script was run once, in October 2020. The result is here: #2 (comment)
A report of what we were able to import from each of the 1711 tosback2 crawl files was generated using report-21.js, the result is in report-21.txt.