Skip to content

Commit

Permalink
feat: improvements to the Docker backend (stjude-rust-labs#16)
Browse files Browse the repository at this point in the history
* fix: use a oneshot channel instead of a callback.

This changes the `Backend` trait to take a oneshot `Sender` rather than a
callback as the caller only cares about when the first task execution has
started.

This simplifies some integration with `wdl-engine`.

* feat: implement CPU and memory resource limits for the Docker backend.

Adds CPU and memory limits representation to `Resources` and `Defaults`. In
the Docker backend, these are now passed through to service creation for
Swarm support.

This commit also fixes the `Into<HostConfig>` implementation for `Resources`
so that it sets the proper fields in the request.

* fix: fixes to the Docker backend.

Fixed the following:

* non-zero exit codes were showing up as signals due to not properly formatting
  a wait exit status.
* the default entry point should be specified when creating the container to
  override any entry point specified in the image, as the full command is
  being provided for a task.
* `bollard` turns non-zero exit codes from an exited container into a wait
  error; we need to handle that error and treat it as a successful wait with
  a non-zero exit code.

* fix: correct the use of `memory_reservation` in container HostConfig.

The Docker `memory_reservation` setting acts as a memory soft limit
used in OOM conditions.

Crankshaft was treating it like a minimum requirement for memory,
which only makes sense when Docker is operating in a swarm.

The fix is to remove setting the option.

* chore: code review feedback.

* chore: update CHANGELOGs.
  • Loading branch information
peterhuene committed Feb 20, 2025
1 parent 5c32b29 commit 3f71486
Show file tree
Hide file tree
Showing 19 changed files with 266 additions and 115 deletions.
65 changes: 29 additions & 36 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions crankshaft-config/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Added

* Added support for specifying CPU and memory limits to configuration defaults ([#16](https://github.com/stjude-rust-labs/crankshaft/pull/16)).
* Adds the initial version of the crate.

### Changed
Expand Down
2 changes: 1 addition & 1 deletion crankshaft-config/src/backend.rs
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ mod tests {
.build();

assert_eq!(config.name(), "generic");
assert!(matches!(config.kind().as_generic(), Some(_)));
assert!(config.kind().as_generic().is_some());
assert_eq!(config.max_tasks(), 10);

let defaults = config.defaults.unwrap();
Expand Down
25 changes: 23 additions & 2 deletions crankshaft-config/src/backend/defaults.rs
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,24 @@ pub struct Defaults {
/// the backend.
cpu: Option<f64>,

/// The default limit of CPU cores that a container can use.
///
/// Not all backends support limits on CPU usage.
cpu_limit: Option<f64>,

/// The amount of RAM (in GiB) to use during execution.
///
/// This is a float because RAM can be allocated more granularly than in
/// gigabytes. These may be rounded to any level of precision that is
/// required for a particular environment.
ram: Option<f64>,

/// The default limit of random access memory that a container can use (in
/// GiB).
///
/// Not all backends support limits on memory usage.
ram_limit: Option<f64>,

/// The amount of disk (in GiB) to use during execution.
///
/// This is a float because disks can be allocated more granularly than in
Expand All @@ -37,12 +48,22 @@ impl Defaults {
self.cpu
}

/// Gets the amount of RAM (in GB).
/// Gets the CPU limit.
pub fn cpu_limit(&self) -> Option<f64> {
self.cpu_limit
}

/// Gets the amount of RAM (in GiB).
pub fn ram(&self) -> Option<f64> {
self.ram
}

/// Gets the amount of disk space (in GB).
/// Gets the RAM limit (in GiB).
pub fn ram_limit(&self) -> Option<f64> {
self.ram_limit
}

/// Gets the amount of disk space (in GiB).
pub fn disk(&self) -> Option<f64> {
self.disk
}
Expand Down
5 changes: 5 additions & 0 deletions crankshaft-docker/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Added

* Added support for respecting CPU and memory limits ([#16](https://github.com/stjude-rust-labs/crankshaft/pull/16)).
* Added support for submitting tasks via the service API for Docker Swarm (#[11](https://github.com/stjude-rust-labs/crankshaft/pull/11)).
* Adds the initial version of the crate.

Expand All @@ -20,3 +21,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
([#8](https://github.com/stjude-rust-labs/crankshaft/pull/8)).
* Replaced `attached` with separate stdout and stderr attach flags
([#8](https://github.com/stjude-rust-labs/crankshaft/pull/8)).

### Fixed

* Fixed a non-zero exit code from a container being treated as a wait error ([#16](https://github.com/stjude-rust-labs/crankshaft/pull/16)).
27 changes: 16 additions & 11 deletions crankshaft-docker/src/container.rs
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ use bollard::container::RemoveContainerOptions;
use bollard::container::StartContainerOptions;
use bollard::container::UploadToContainerOptions;
use bollard::container::WaitContainerOptions;
use bollard::secret::ContainerWaitResponse;
use futures::TryStreamExt as _;
use tokio_stream::StreamExt as _;
use tracing::debug;
Expand Down Expand Up @@ -112,6 +113,8 @@ impl Container {
.map_err(Error::Docker)?
.output;

debug!("starting container `{name}`", name = self.name);

// Start the container.
self.client
.start_container(&self.name, None::<StartContainerOptions<String>>)
Expand Down Expand Up @@ -148,23 +151,24 @@ impl Container {
.map_err(Error::Docker)?;

// Wait for the container to be completed.

debug!("waiting for container `{name}` to exit", name = self.name);
let mut wait_stream = self
.client
.wait_container(&self.name, None::<WaitContainerOptions<String>>);

let mut exit_code = None;
if let Some(result) = wait_stream.next().await {
let response = result.map_err(Error::Docker)?;
if let Some(error) = response.error {
return Err(Error::Message(format!(
"failed to wait for Docker task to complete: {error}",
error = error
.message
.expect("Docker reported an error without a message")
)));
match result {
// Bollard turns non-zero exit codes into wait errors, so check for both
Ok(ContainerWaitResponse {
status_code: code, ..
})
| Err(bollard::errors::Error::DockerContainerWaitError { code, .. }) => {
exit_code = Some(code);
}
Err(e) => return Err(e.into()),
}

exit_code = Some(response.status_code);
}

if exit_code.is_none() {
Expand All @@ -186,7 +190,8 @@ impl Container {

#[cfg(unix)]
let output = Output {
status: ExitStatus::from_raw(exit_code.unwrap() as i32),
// See WEXITSTATUS from wait(2) to explain the shift
status: ExitStatus::from_raw((exit_code.unwrap() as i32) << 8),
stdout,
stderr,
};
Expand Down
3 changes: 3 additions & 0 deletions crankshaft-docker/src/container/builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,9 @@ impl Builder {
// one way or the other and not rely on the default.
cmd: Some(cmd),
image: Some(image),
// Override the entrypoint to the default Docker entrypoint as we're providing
// the full command
entrypoint: Some(vec![String::new()]),
attach_stdout: Some(self.attach_stdout),
attach_stderr: Some(self.attach_stderr),
// END NOTE
Expand Down
5 changes: 2 additions & 3 deletions crankshaft-docker/src/images.rs
Original file line number Diff line number Diff line change
Expand Up @@ -64,12 +64,11 @@ pub(crate) async fn list_images(docker: &Docker) -> Result<Vec<ImageSummary>> {
/// * Pulling the image from the remote repository.
pub(crate) async fn ensure_image(docker: &Docker, image: impl AsRef<str>) -> Result<()> {
let image = image.as_ref();
debug!("ensuring image: `{image}`");

debug!("ensuring image `{image}` exists locally");

let mut filters = HashMap::new();
filters.insert("reference", vec![image]);

debug!("checking if image exists locally: `{image}`");
let results = docker
.inner()
.list_images(Some(ListImagesOptions {
Expand Down
2 changes: 1 addition & 1 deletion crankshaft-docker/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ use crate::images::*;
#[derive(Error, Debug)]
pub enum Error {
/// An error from [`bollard`].
#[error("docker error: {0}")]
#[error(transparent)]
Docker(#[from] bollard::errors::Error),
/// A required value was missing for a builder field.
#[error("missing required builder field `{0}`")]
Expand Down
Loading

0 comments on commit 3f71486

Please sign in to comment.