Name		Name	Last commit message	Last commit date
parent directory ..
Dockerfile.cpu		Dockerfile.cpu
Dockerfile.gpu		Dockerfile.gpu
README.md		README.md
build.py		build.py
entrypoint.sh		entrypoint.sh

README.md

Sockeye Docker Images

Run the build script to produce optimized CPU and GPU Docker images with the current version of Sockeye:

python3 sockeye_contrib/docker/build.py (cpu|gpu)

The "cpu" version includes support for int8 inference with intgemm and full MKL support.
The "gpu" version includes support for distributed training with Horovod and NCCL.

Example: Distributed Training with Horovod

Using the Docker image greatly simplifies distributed training.

Host Setup

See the Horovod instructions for setting up hosts:

Running

This is an example running on CPUs across 2 hosts.

COMMIT is the Sockeye commit
HOST2 is the address of the secondary host
/mnt/share/ssh is a SSH directory set up following the Horovod instructions above.
/mnt/share is a general shared directory that all workers will access to read training data and write model files.

Secondary Host(s)

On each secondary host, start a Docker container running sshd. Horovod/OpenMPI will connect to these hosts to launch workers.

docker run --rm -i --network=host -v /mnt/share/ssh:/home/ec2-user/.ssh -v /mnt/share:/mnt/share sockeye-gpu \
    bash -c "/usr/sbin/sshd -p 12345; sleep infinity"

Primary Host

On the primary host, prepare the training data.

docker run --rm -i -v /mnt/share:/mnt/share --user ec2-user:ec2-user sockeye-gpu \
    python3 -m sockeye.prepare_data \
        --source /mnt/share/data/train.src \
        --target /mnt/share/data/train.src \
        --output /mnt/share/data/prepared_train

Start Sockeye training with horovodrun.

docker run --rm -i --network=host -v /mnt/share/ssh:/home/ec2-user/.ssh -v /mnt/share:/mnt/share --user ec2-user:ec2-user sockeye-gpu \
    horovodrun -np 2 -H localhost:1,HOST2:1 -p 12345 python3 -m sockeye.train \
        --prepared-data /mnt/share/data/prepared_train \
        --validation-source /mnt/share/data/dev.src \
        --validation-target /mnt/share/data/dev.trg \
        --output /mnt/share/data/model \
        --lock-dir /mnt/share/lock \
        --use-cpu \
        --horovod

Example: Fast Int8 Inference

A normal Sockeye model (trained as float32, with or without AMP) can be quantized at runtime for int8 inference. In the following example, LEXICON is a top-k lexicon (see the fast_align documentation and sockeye.lexicon create; k=200 works well in practice) and NCPUS is the number of physical CPU cores on the host running Sockeye.

docker run --rm -i -v $PWD:/work -w /work sockeye-cpu python3 -m sockeye.translate --use-cpu --omp-num-threads NCPUS --dtype int8 --input test.src --restrict-lexicon LEXICON --models model --output test.out

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docker

docker

README.md

Sockeye Docker Images

Example: Distributed Training with Horovod

Host Setup

Running

Secondary Host(s)

Primary Host

Example: Fast Int8 Inference

Files

docker

Directory actions

More options

Directory actions

More options

Latest commit

History

docker

Folders and files

parent directory

README.md

Sockeye Docker Images

Example: Distributed Training with Horovod

Host Setup

Running

Secondary Host(s)

Primary Host

Example: Fast Int8 Inference