Official code for the SkipPipe project
This code uses the following two repositories:
-
simplellm - for construction of the models, loading datasets, tokenizers, etc.
-
DecCom - for communication between devices
You can install both by cloning the repo and doing pip install .
or by running the setup.sh provided here.
Additionally, you need to install the requirements in requirements.txt with pip install -r requirements.txt
Start training with
./run.sh [FIRST DEVICE] [LAST DEVICE] [SETTING] [SAMPLES PER MICROBATCH]
Which will start all nodes from FIRST DEVICE to LAST DEVICE on this machine with a give SETTING (random for DT-FM Skip, ca-partial for SkipPipe with TC2, non-ca-partial for SkipPipe without TC2, or baseline for DT-FM).