Skip to content

Latest commit

 

History

History

docker

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Sockeye Docker Images

Run the build script to produce optimized CPU and GPU Docker images with the current version of Sockeye:

python3 sockeye_contrib/docker/build.py (cpu|gpu)
  • The "cpu" version includes support for int8 inference with intgemm and full MKL support.
  • The "gpu" version includes support for distributed training with Horovod and NCCL.

Example: Distributed Training with Horovod

Using the Docker image greatly simplifies distributed training.

Host Setup

See the Horovod instructions for setting up hosts:

Running

This is an example running on CPUs across 2 hosts.

  • COMMIT is the Sockeye commit
  • HOST2 is the address of the secondary host
  • /mnt/share/ssh is a SSH directory set up following the Horovod instructions above.
  • /mnt/share is a general shared directory that all workers will access to read training data and write model files.

Secondary Host(s)

On each secondary host, start a Docker container running sshd. Horovod/OpenMPI will connect to these hosts to launch workers.

docker run --rm -i --network=host -v /mnt/share/ssh:/home/ec2-user/.ssh -v /mnt/share:/mnt/share sockeye-gpu \
    bash -c "/usr/sbin/sshd -p 12345; sleep infinity"

Primary Host

On the primary host, prepare the training data.

docker run --rm -i -v /mnt/share:/mnt/share --user ec2-user:ec2-user sockeye-gpu \
    python3 -m sockeye.prepare_data \
        --source /mnt/share/data/train.src \
        --target /mnt/share/data/train.src \
        --output /mnt/share/data/prepared_train

Start Sockeye training with horovodrun.

docker run --rm -i --network=host -v /mnt/share/ssh:/home/ec2-user/.ssh -v /mnt/share:/mnt/share --user ec2-user:ec2-user sockeye-gpu \
    horovodrun -np 2 -H localhost:1,HOST2:1 -p 12345 python3 -m sockeye.train \
        --prepared-data /mnt/share/data/prepared_train \
        --validation-source /mnt/share/data/dev.src \
        --validation-target /mnt/share/data/dev.trg \
        --output /mnt/share/data/model \
        --lock-dir /mnt/share/lock \
        --use-cpu \
        --horovod

Example: Fast Int8 Inference

A normal Sockeye model (trained as float32, with or without AMP) can be quantized at runtime for int8 inference. In the following example, LEXICON is a top-k lexicon (see the fast_align documentation and sockeye.lexicon create; k=200 works well in practice) and NCPUS is the number of physical CPU cores on the host running Sockeye.

docker run --rm -i -v $PWD:/work -w /work sockeye-cpu python3 -m sockeye.translate --use-cpu --omp-num-threads NCPUS --dtype int8 --input test.src --restrict-lexicon LEXICON --models model --output test.out