Nvidia Runtime Extension for Flatcar #3886

njuettner · 2025-02-20T13:43:32Z

Towards: https://github.com/giantswarm/adidas/issues/1351

In order to offer Nvidia GPU based instances in Kubernetes we would need to install Nvidia runtime by default in Flatcar.

Flatcar offers a system extensions Nvidia runtime to install the container runtime and the toolkit which is needed to call GPU's from containers.

IMPORTANT
This RAW file is needed in /etc/extensions either as a symlink or directly.
Naming must be nvidia_runtime.raw.

We need to ensure when we build the imagges we have the NVIDIA runtime sysext present.

The text was updated successfully, but these errors were encountered:

AverageMarcus · 2025-02-21T13:32:17Z

I'm not sure the sysext-bakery is the appropriate way to fetch / install this. It seems very focussed on mutable / rolling upgrades and there's no way for us to leverage Renovate to automate updates to the nvidia runtime as the repo uses a single "latest" release that is updated over time (WTF?? 🤨 )

I've asked for some advice in the Flatcar slack channel (https://cloud-native.slack.com/archives/C07LW8GQ4F9/p1740144500629549)

AverageMarcus · 2025-02-24T14:43:02Z

Ok, after talking with upstream I have a reasonable idea on how best we could proceed:

It's unlikely, in the short term, that the upstream bakery will be updated to support the way of working we want so...
We would create our own "bakery" containing the sysext we care about (nvidia_runtime for now) which we can then publish as actual releases that can be handled by Renovate (we'd need some GitHub action or Tekton pipeline to handle building the raw and producing a new Release with the artefacts added)
We should use the upcoming changes from this PR to handle the sysext creation: Sysext Build refactoring flatcar/sysext-bakery#115
We then add a custom ansible role to our capi-image-builder that fetches the sysext raw file from the GitHub release. This could be made in such a way to handle multiple sysext in the future.
If this approach works well, we might also choose to change the way we install Teleport to also follow this approach.

Some thoughts...

This is more work than I'd have liked but I think if we can prove the use case we might actually be able to push this to upstream image-builder and eventually have it as part of that rather than our own custom thing. I'd need to discuss it with the other maintainers if we get a working example of this.

Alternatively...

We could possibly contribute to the sysext-bakery to support generating GitHub releases for each tooling version it supports which would make it easier for us to consume in much the same was as planned above but without having to maintain the repo ourself.

njuettner added this to Roadmap Feb 20, 2025

njuettner moved this to Up Next ➡️ in Roadmap Feb 20, 2025

njuettner added team/tenet Team Tenet area/kaas Mission: Cloud Native Platform - Self-driving Kubernetes as a Service labels Feb 20, 2025

AverageMarcus self-assigned this Feb 20, 2025

AverageMarcus moved this from Up Next ➡️ to In Progress ⛏️ in Roadmap Feb 20, 2025

AverageMarcus moved this from In Progress ⛏️ to Backlog 📦 in Roadmap Feb 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nvidia Runtime Extension for Flatcar #3886

Nvidia Runtime Extension for Flatcar #3886

njuettner commented Feb 20, 2025 •

edited

Loading

AverageMarcus commented Feb 21, 2025

AverageMarcus commented Feb 24, 2025

Nvidia Runtime Extension for Flatcar #3886

Nvidia Runtime Extension for Flatcar #3886

Comments

njuettner commented Feb 20, 2025 • edited Loading

AverageMarcus commented Feb 21, 2025

AverageMarcus commented Feb 24, 2025

njuettner commented Feb 20, 2025 •

edited

Loading