Docker improvements #16

peterhuene · 2025-02-19T17:59:36Z

A few fixes required for wdl-engine's Crankshaft backend:

Use a oneshot channel for notifying of the task starting rather than a callback; originally I thought a callback would be sufficient, but wdl-engine would need to synchronize the underlying oneshot channel it was going to send on. It's easier to just remove the indirection and pass the channel's sender directly into Crankshaft.
Implement resource limits for the Docker backend.
Fix the HostConfig derivation in the Docker backend so that it uses the correct CPU field.

Before submitting this PR, please make sure:

You have added a few sentences describing the PR here.
You have added yourself or the appropriate individual as the assignee.
You have added at least one relevant code reviewer to the PR.
Your code builds clean without any errors or warnings.
You have added tests (when appropriate).
You have updated the README or other documentation to account for these changes (when appropriate).
You have added an entry to the relevant CHANGELOG.md (see "keep a changelog" for more information).
Your commit messages follow the conventional commit style.

This changes the `Backend` trait to take a oneshot `Sender` rather than a callback as the caller only cares about when the first task execution has started. This simplifies some integration with `wdl-engine`.

Adds CPU and memory limits representation to `Resources` and `Defaults`. In the Docker backend, these are now passed through to service creation for Swarm support. This commit also fixes the `Into<HostConfig>` implementation for `Resources` so that it sets the proper fields in the request.

crankshaft-engine/src/task/resources.rs

Fixed the following: * non-zero exit codes were showing up as signals due to not properly formatting a wait exit status. * the default entry point should be specified when creating the container to override any entry point specified in the image, as the full command is being provided for a task. * `bollard` turns non-zero exit codes from an exited container into a wait error; we need to handle that error and treat it as a successful wait with a non-zero exit code.

The Docker `memory_reservation` setting acts as a memory soft limit used in OOM conditions. Crankshaft was treating it like a minimum requirement for memory, which only makes sense when Docker is operating in a swarm. The fix is to remove setting the option.

crankshaft-config/src/backend/defaults.rs

crankshaft-engine/src/service/runner.rs

crankshaft-engine/src/service/runner/backend/docker.rs

crankshaft-engine/src/task/resources.rs

claymcleod

Looks good after we address these comments.

* fix: use a oneshot channel instead of a callback. This changes the `Backend` trait to take a oneshot `Sender` rather than a callback as the caller only cares about when the first task execution has started. This simplifies some integration with `wdl-engine`. * feat: implement CPU and memory resource limits for the Docker backend. Adds CPU and memory limits representation to `Resources` and `Defaults`. In the Docker backend, these are now passed through to service creation for Swarm support. This commit also fixes the `Into<HostConfig>` implementation for `Resources` so that it sets the proper fields in the request. * fix: fixes to the Docker backend. Fixed the following: * non-zero exit codes were showing up as signals due to not properly formatting a wait exit status. * the default entry point should be specified when creating the container to override any entry point specified in the image, as the full command is being provided for a task. * `bollard` turns non-zero exit codes from an exited container into a wait error; we need to handle that error and treat it as a successful wait with a non-zero exit code. * fix: correct the use of `memory_reservation` in container HostConfig. The Docker `memory_reservation` setting acts as a memory soft limit used in OOM conditions. Crankshaft was treating it like a minimum requirement for memory, which only makes sense when Docker is operating in a swarm. The fix is to remove setting the option. * chore: code review feedback. * chore: update CHANGELOGs.

peterhuene added 2 commits February 19, 2025 12:58

fix: use a oneshot channel instead of a callback.

44ae405

This changes the `Backend` trait to take a oneshot `Sender` rather than a callback as the caller only cares about when the first task execution has started. This simplifies some integration with `wdl-engine`.

peterhuene requested a review from claymcleod February 19, 2025 17:59

peterhuene self-assigned this Feb 19, 2025

peterhuene commented Feb 19, 2025

View reviewed changes

crankshaft-engine/src/task/resources.rs Show resolved Hide resolved

peterhuene added 2 commits February 20, 2025 01:02