Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nvidia Runtime Extension for Flatcar #3886

Open
njuettner opened this issue Feb 20, 2025 · 2 comments
Open

Nvidia Runtime Extension for Flatcar #3886

njuettner opened this issue Feb 20, 2025 · 2 comments
Assignees
Labels
area/kaas Mission: Cloud Native Platform - Self-driving Kubernetes as a Service team/tenet Team Tenet

Comments

@njuettner
Copy link
Member

njuettner commented Feb 20, 2025

Towards: https://github.com/giantswarm/adidas/issues/1351

In order to offer Nvidia GPU based instances in Kubernetes we would need to install Nvidia runtime by default in Flatcar.

Flatcar offers a system extensions Nvidia runtime to install the container runtime and the toolkit which is needed to call GPU's from containers.

IMPORTANT
This RAW file is needed in /etc/extensions either as a symlink or directly.
Naming must be nvidia_runtime.raw.

We need to ensure when we build the imagges we have the NVIDIA runtime sysext present.

@njuettner njuettner moved this to Up Next ➡️ in Roadmap Feb 20, 2025
@njuettner njuettner added team/tenet Team Tenet area/kaas Mission: Cloud Native Platform - Self-driving Kubernetes as a Service labels Feb 20, 2025
@AverageMarcus AverageMarcus self-assigned this Feb 20, 2025
@AverageMarcus AverageMarcus moved this from Up Next ➡️ to In Progress ⛏️ in Roadmap Feb 20, 2025
@AverageMarcus
Copy link
Member

I'm not sure the sysext-bakery is the appropriate way to fetch / install this. It seems very focussed on mutable / rolling upgrades and there's no way for us to leverage Renovate to automate updates to the nvidia runtime as the repo uses a single "latest" release that is updated over time (WTF?? 🤨 )

I've asked for some advice in the Flatcar slack channel (https://cloud-native.slack.com/archives/C07LW8GQ4F9/p1740144500629549)

@AverageMarcus
Copy link
Member

Ok, after talking with upstream I have a reasonable idea on how best we could proceed:

  • It's unlikely, in the short term, that the upstream bakery will be updated to support the way of working we want so...
  • We would create our own "bakery" containing the sysext we care about (nvidia_runtime for now) which we can then publish as actual releases that can be handled by Renovate (we'd need some GitHub action or Tekton pipeline to handle building the raw and producing a new Release with the artefacts added)
  • We should use the upcoming changes from this PR to handle the sysext creation: Sysext Build refactoring flatcar/sysext-bakery#115
  • We then add a custom ansible role to our capi-image-builder that fetches the sysext raw file from the GitHub release. This could be made in such a way to handle multiple sysext in the future.
  • If this approach works well, we might also choose to change the way we install Teleport to also follow this approach.

Some thoughts...

This is more work than I'd have liked but I think if we can prove the use case we might actually be able to push this to upstream image-builder and eventually have it as part of that rather than our own custom thing. I'd need to discuss it with the other maintainers if we get a working example of this.

Alternatively...

We could possibly contribute to the sysext-bakery to support generating GitHub releases for each tooling version it supports which would make it easier for us to consume in much the same was as planned above but without having to maintain the repo ourself.

@AverageMarcus AverageMarcus moved this from In Progress ⛏️ to Backlog 📦 in Roadmap Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kaas Mission: Cloud Native Platform - Self-driving Kubernetes as a Service team/tenet Team Tenet
Projects
Status: Backlog 📦
Development

No branches or pull requests

2 participants