Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker improvements #16

Merged
merged 6 commits into from
Feb 20, 2025

Conversation

peterhuene
Copy link
Collaborator

@peterhuene peterhuene commented Feb 19, 2025

A few fixes required for wdl-engine's Crankshaft backend:

  • Use a oneshot channel for notifying of the task starting rather than a callback; originally I thought a callback would be sufficient, but wdl-engine would need to synchronize the underlying oneshot channel it was going to send on. It's easier to just remove the indirection and pass the channel's sender directly into Crankshaft.

  • Implement resource limits for the Docker backend.

  • Fix the HostConfig derivation in the Docker backend so that it uses the correct CPU field.

Before submitting this PR, please make sure:

  • You have added a few sentences describing the PR here.
  • You have added yourself or the appropriate individual as the assignee.
  • You have added at least one relevant code reviewer to the PR.
  • Your code builds clean without any errors or warnings.
  • You have added tests (when appropriate).
  • You have updated the README or other documentation to account for these changes (when appropriate).
  • You have added an entry to the relevant CHANGELOG.md (see "keep a changelog" for more information).
  • Your commit messages follow the conventional commit style.

This changes the `Backend` trait to take a oneshot `Sender` rather than a
callback as the caller only cares about when the first task execution has
started.

This simplifies some integration with `wdl-engine`.
Adds CPU and memory limits representation to `Resources` and `Defaults`. In
the Docker backend, these are now passed through to service creation for
Swarm support.

This commit also fixes the `Into<HostConfig>` implementation for `Resources`
so that it sets the proper fields in the request.
@peterhuene peterhuene self-assigned this Feb 19, 2025
Fixed the following:

* non-zero exit codes were showing up as signals due to not properly formatting
  a wait exit status.
* the default entry point should be specified when creating the container to
  override any entry point specified in the image, as the full command is
  being provided for a task.
* `bollard` turns non-zero exit codes from an exited container into a wait
  error; we need to handle that error and treat it as a successful wait with
  a non-zero exit code.
The Docker `memory_reservation` setting acts as a memory soft limit
used in OOM conditions.

Crankshaft was treating it like a minimum requirement for memory,
which only makes sense when Docker is operating in a swarm.

The fix is to remove setting the option.
Copy link
Member

@claymcleod claymcleod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good after we address these comments.

@peterhuene peterhuene merged commit aa32a2e into stjude-rust-labs:main Feb 20, 2025
11 checks passed
@peterhuene peterhuene deleted the docker-improvements branch February 20, 2025 21:23
peterhuene added a commit to peterhuene/crankshaft that referenced this pull request Feb 20, 2025
* fix: use a oneshot channel instead of a callback.

This changes the `Backend` trait to take a oneshot `Sender` rather than a
callback as the caller only cares about when the first task execution has
started.

This simplifies some integration with `wdl-engine`.

* feat: implement CPU and memory resource limits for the Docker backend.

Adds CPU and memory limits representation to `Resources` and `Defaults`. In
the Docker backend, these are now passed through to service creation for
Swarm support.

This commit also fixes the `Into<HostConfig>` implementation for `Resources`
so that it sets the proper fields in the request.

* fix: fixes to the Docker backend.

Fixed the following:

* non-zero exit codes were showing up as signals due to not properly formatting
  a wait exit status.
* the default entry point should be specified when creating the container to
  override any entry point specified in the image, as the full command is
  being provided for a task.
* `bollard` turns non-zero exit codes from an exited container into a wait
  error; we need to handle that error and treat it as a successful wait with
  a non-zero exit code.

* fix: correct the use of `memory_reservation` in container HostConfig.

The Docker `memory_reservation` setting acts as a memory soft limit
used in OOM conditions.

Crankshaft was treating it like a minimum requirement for memory,
which only makes sense when Docker is operating in a swarm.

The fix is to remove setting the option.

* chore: code review feedback.

* chore: update CHANGELOGs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants