Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] Container GPU passthrough #7

Closed
fakezeta opened this issue Oct 7, 2022 · 45 comments
Closed

[FR] Container GPU passthrough #7

fakezeta opened this issue Oct 7, 2022 · 45 comments
Assignees
Labels
enhancement New feature or request

Comments

@fakezeta
Copy link

fakezeta commented Oct 7, 2022

As per our conversation on Prusa3D Forum enable 3D acceleration with GPU passthrough.

@helfrichmichael helfrichmichael added the enhancement New feature or request label Mar 31, 2023
@helfrichmichael helfrichmichael changed the title Container GPU passthrough [FR] Container GPU passthrough Mar 31, 2023
@vajonam
Copy link
Contributor

vajonam commented Apr 2, 2024

https://hub.docker.com/r/nvidia/opengl/ mabye if we rebuild the container using this base, it might work with minimal change?

I will try these out and see how far I get.

@vajonam
Copy link
Contributor

vajonam commented Apr 2, 2024

This is most promising. https://hub.docker.com/r/damanikjosh/virtualgl-turbovnc

@helfrichmichael
Copy link
Owner

Sorry for the mega delay. Meant to look at this over the weekend, but was swamped with other stuff.

I think https://github.com/linuxserver/docker-kasm/blob/master/Dockerfile has a lot of really useful NVidia Docker bits that I plan to learn from and adapt the current Dockerfiles for my containers.

Basically I think I missed including the NVidia Toolkit package https://github.com/NVIDIA/nvidia-container-toolkit and its dependencies.

Hoping I can poke this later after work.

@helfrichmichael
Copy link
Owner

This is most promising. https://hub.docker.com/r/damanikjosh/virtualgl-turbovnc

Though on this topic, I am beginning to investigate alternative noVNC solutions since the one I use has been deprecated. I might also just fork it and maintain it, but we'll see.

@vajonam
Copy link
Contributor

vajonam commented Apr 2, 2024

Okay got it running.
using the default image and just doign apt install for prusa-slcier (ver 2.4)

https://hub.docker.com/r/damanikjosh/virtualgl-turbovnc

image

@vajonam
Copy link
Contributor

vajonam commented Apr 2, 2024

Btu the caveat is the to use VirtualGL, you need to run a minimal X server on your headless machine, and setup virtualgl_server but now it looks awesome.. I think the next bit of work will be to trim this down similar to what you have done in the dockerfile. figured this out we no longer need this..

@vajonam
Copy link
Contributor

vajonam commented Apr 2, 2024

Peek.2024-04-02.15-02.mp4

@vajonam
Copy link
Contributor

vajonam commented Apr 2, 2024

This is most promising. https://hub.docker.com/r/damanikjosh/virtualgl-turbovnc

Though on this topic, I am beginning to investigate alternative noVNC solutions since the one I use has been deprecated. I might also just fork it and maintain it, but we'll see.

TurboVNC / TigreVNC seems to be able to fit the bill

@helfrichmichael
Copy link
Owner

Peek.2024-04-02.15-02.mp4

Thanks a ton for the work on this so far! It's looking super smooth for the slicing view now. Feel free to send a pull request if you'd like and I'm happy to review and merge 😄 .

This is most promising. https://hub.docker.com/r/damanikjosh/virtualgl-turbovnc

Though on this topic, I am beginning to investigate alternative noVNC solutions since the one I use has been deprecated. I might also just fork it and maintain it, but we'll see.

TurboVNC / TigreVNC seems to be able to fit the bill

Neither of those provide a web browser package though, correct? Ideally that's something we'd probably like to retain for the repos.

Thanks again for the work on this!

@vajonam
Copy link
Contributor

vajonam commented Apr 2, 2024

Not sure if I can do a PR against your re-pro it will be all new I think.

https://github.com/damanikjosh/virtualgl-turbovnc-docker/blob/main/Dockerfile uses a base

ARG UBUNTU_VERSION=22.04

FROM nvidia/opengl:1.2-glvnd-runtime-ubuntu${UBUNTU_VERSION}

so its very bloated being based on ubuntu. however its got all the same bits, instead of openbox it uses some other DM (lbuntu), but has VNC and novnc, (my video you can see its all in a browser) I will fork this and see what I can do. but this will only work nvidia GPU's obviously.

@vajonam
Copy link
Contributor

vajonam commented Apr 2, 2024

https://gist.github.com/vajonam/d1e713bcfd47e03f27549258ef53690e <- WIP but works for the most part, I have added some of your code. I think should be able to submit a PR. Standby not too different after all, ubunut/debian still a bit bloated.

@vajonam
Copy link
Contributor

vajonam commented Apr 2, 2024

still need to add back supervisord will work on that next.

@vajonam
Copy link
Contributor

vajonam commented Apr 3, 2024

okay I have working version with supervisor etc, some fine tuning is needed for passing environment variables. look for a PR shortly. this should work regardless of nvidia, but worst case you might have 2 dockerfiles one nvidia gpu and one for cpu.

@vajonam
Copy link
Contributor

vajonam commented Apr 3, 2024

Added #15 to address this.

@helfrichmichael
Copy link
Owner

Added #15 to address this.

Thanks for the work so far. I just pulled the latest commit(s) and I am unable to run this via CLI (for unraid and similar I am making sure the templates match up and trying to figure out the migration path for this set of changes).

My guess is this is due to the supervisord.conf changes

2024-04-03 17:25:13 Error: Format string '/opt/TurboVNC/bin/vncserver %(ENV_DISPLAY)s -fg  %(ENV_VNC_SEC)s -depth 24 -geometry %(ENV_VNC_RESOLUTION)s' for 'program:vnc.command' contains names ('ENV_VNC_RESOLUTION') which cannot be expanded. Available names: ENV_DEBIAN_FRONTEND, ENV_DISPLAY, ENV_HOME, ENV_HOSTNAME, ENV_LC_CTYPE, ENV_LD_LIBRARY_PATH, ENV_LOCALFBPORT, ENV_NOVNC_PORT, ENV_NVIDIA_DRIVER_CAPABILITIES, ENV_NVIDIA_VISIBLE_DEVICES, ENV_PATH, ENV_PWD, ENV_SHLVL, ENV_SSL_CERT_FILE, ENV_SUPD_LOGLEVEL, ENV_VGLRUN, ENV_VGL_DISPLAY, ENV_VNC_PORT, ENV_VNC_SEC, group_name, here, host_node_name, numprocs, process_num, program_name in section 'program:vnc' (file: '/etc/supervisord.conf')
2024-04-03 17:25:13 For help, use /usr/bin/supervisord -h

Command I am running FWIW:

docker run --detach --volume=prusaslicer-novnc-data:/configs/ --volume=prusaslicer-novnc-prints:/prints/ -p 8080:8080 -e SSL_CERT_FILE="/etc/ssl/certs/ca-certificates.crt" --gpus all --name=prusaslicer-novnc prusaslicer-novnc

Playing with this a bit more on my end, but once it's ready for review, let me know and I can take a pass :).

@vajonam
Copy link
Contributor

vajonam commented Apr 4, 2024

this is the environment variables I am passing

    prusaslicer:
      # image: mikeah/prusaslicer-novnc
      image: cr.localdomain.com/prusa-new
      container_name: prusaslicer
      environment:
        - SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt
        - NVIDIA_VISIBLE_DEVICES=1
        - NVIDIA_DRIVER_CAPABILITIES=all
        - VGL_DISPLAY=egl
        - SUPD_LOGLEVEL=INFO # TRACE
        - VNC_RESOLUTION=1900x1200
      volumes:
        - /opt/docker/configs/prusaslicer/config:/configs
        - /opt/docker/configs/prusaslicer/prints:/prints
      restart: unless-stopped

think you were missing VNC_RESOLUTION, It should default if not set, not sure why that is not happening will have a look. Be sure to add all the environment variables, you should be able to pass them as -e FOO=BAR

      environment:
        - SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt
        - NVIDIA_VISIBLE_DEVICES=1
        - NVIDIA_DRIVER_CAPABILITIES=all
        - VGL_DISPLAY=egl
        - SUPD_LOGLEVEL=INFO # TRACE
        - VNC_RESOLUTION=1900x1200

just added an export to make sure its defaulted if not mentioned.

@vajonam
Copy link
Contributor

vajonam commented Apr 4, 2024

I am assuming that you have a nvidia GPU on your installation. I haven't tested this without. but the image is nvidia image and needs nvidia-docker2 from what I understand.

@vajonam
Copy link
Contributor

vajonam commented Apr 4, 2024

@helfrichmichael did you get it running after the export of the param, not sure why I forgot that.. anyhow. I had a couple of questions / suggestions.

  1. move to GTK3, performance seems quite good with EGL/VirtualGL accel, any reason you chose to stick with GTK2?
  2. we can include superslicer in here too, I know you have a branch. I was thinking either this can be selected by a runtime env variable, which slicer to launch

@helfrichmichael
Copy link
Owner

@helfrichmichael did you get it running after the export of the param, not sure why I forgot that.. anyhow. I had a couple of questions / suggestions.

  1. move to GTK3, performance seems quite good with EGL/VirtualGL accel, any reason you chose to stick with GTK2?
  2. we can include superslicer in here too, I know you have a branch. I was thinking either this can be selected by a runtime env variable, which slicer to launch

Yep, once the param was exported, it worked just fine (I also had tried passing it as a command line env prior to this FWIW).

For Supeslicer and the other slicers, I am happy to replicate this over to those once I've reviewed and merged the code unless you have capacity to update those -- no pressure either way, but this work should be a great base to work from for GPU passthrough on these apps..

Ideally for now I think keeping them separate would be ideal just to prevent having to provide migration paths for those on existing unraid templates etc (I find it's a bit nuanced to do template updates TBH).

The only other thing I am curious about is figuring out a way to allow automatic VNC resizing as this has been immensely useful for me when I go from device to device (I have a Mimo Vue touchscreen on my desktop in the garage for the printers that is fairly low res for easy presses). I haven't looked into how noVNC accomplished this, but if we can solve for either autoresizing or a static size that would be amazing.

Thanks again @vajonam , really appreciate your help and dedication on this effort.

@vajonam
Copy link
Contributor

vajonam commented Apr 4, 2024

For Supeslicer and the other slicers, I am happy to replicate this over to those once I've reviewed and merged the code unless you have capacity to update those -- no pressure either way, but this work should be a great base to work from for GPU passthrough on these apps..

Excellent.

Ideally for now I think keeping them separate would be ideal just to prevent having to provide migration paths for those on existing unraid templates etc (I find it's a bit nuanced to do template updates TBH).

Agreed.

The only other thing I am curious about is figuring out a way to allow automatic VNC resizing as this has been immensely useful for me when I go from device to device (I have a Mimo Vue touchscreen on my desktop in the garage for the printers that is fairly low res for easy presses). I haven't looked into how noVNC accomplished this, but if we can solve for either autoresizing or a static size that would be amazing.

I am not sure I understand, but what it looks like it auto resized the window, sadly the right panel on prusaslicer isn't resize able might have to move to a modern view

Thanks again @vajonam , really appreciate your help and dedication on this effort.

Yeah no problem you're welcome. For the most part this was driven by need, I had some complex files a few MB that the software rendered just couldn't do when it came to 3D. This makes it awesome! The previous solution was good for the simple stuff.

To disable VirtualGL run by setting VGLRUN= and you should see it switch back to the MESA software render and older performance. I will change the name of param to ENABLEHWGPU=ture or something like that maybe to make it more user friendly. I have been using this for a past few days do some slicing and printing works really well!

@helfrichmichael
Copy link
Owner

Oh wait. I'm just opening the wrong VNC file I think (we should adjust the default file for the HTTP server probably if we can).

http://localhost:8080/vnc_lite.html?resize=true seemed to render it flawlessly! I am having an issue opening the vnc.html file so I need to look at that.

I am going to try to review this after work so I can give this a stamp of approval.

This is awesome to see so far along!

@helfrichmichael
Copy link
Owner

Oh wait. I'm just opening the wrong VNC file I think (we should adjust the default file for the HTTP server probably if we can).

http://localhost:8080/vnc_lite.html?resize=true seemed to render it flawlessly! I am having an issue opening the vnc.html file so I need to look at that.

I am going to try to review this after work so I can give this a stamp of approval.

This is awesome to see so far along!

To account for this, I will likely make the following PR:

Dockerfile:

# Add a default file to resize, etc for noVNC.
ADD vncresize.html /usr/share/novnc/index.html

vncresize.html:

<html>
    <head>
        <script>
            window.location.replace("./vnc.html?autoconnect=true&resize=remote&reconnect=true&show_dot=true");
        </script>
    </head>
</html>

@helfrichmichael
Copy link
Owner

@vajonam just pushed to Docker. Successfully set it up on my unraid server with an RTX 3070. It's not picking up the GPU though it seems so I need to dive into this a bit more.

Variables I set:

NVIDIA_VISIBLE_DEVICES=all
NVIDIA_DRIVER_CAPABILITIES=all
ENABLEHWGPU=true

I'll keep poking at this when I have some time.

@vajonam
Copy link
Contributor

vajonam commented May 17, 2024

You might be missing some permissions on the host system, around device permissions. there is tool called vglserver_config that can help you set that up its part of the virtualgl package.

@vajonam
Copy link
Contributor

vajonam commented May 17, 2024

Does nvidia-smi -l show this line on the host?

|    1   N/A  N/A   2955483      G   /slic3r/slic3r-dist/bin/prusa-slicer         92MiB |

@vajonam
Copy link
Contributor

vajonam commented May 17, 2024

Just pulled your latest mage, and works nicely in my environment.

@helfrichmichael
Copy link
Owner

Does nvidia-smi -l show this line on the host?

|    1   N/A  N/A   2955483      G   /slic3r/slic3r-dist/bin/prusa-slicer         92MiB |

Sadly, no. I see "No running processes found" for all of the entities. In binhex-plexpass for example, I see the GPU passthrough just fine, etc. I can try to poke this more after work probably.

@vajonam
Copy link
Contributor

vajonam commented May 17, 2024

This is virtualGL passthru not GPU regular passthough which is bit different. Let me know what you find.

@helfrichmichael
Copy link
Owner

For full context here is my unraid variables surrounding GPU acceleration:
image

Additionally I tried running the container as privileged to no avail.

@vajonam
Copy link
Contributor

vajonam commented May 17, 2024

this is what I am using in my docker compose maybe you need to pass the egl

        - SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt
        - NVIDIA_VISIBLE_DEVICES=1
        - NVIDIA_DRIVER_CAPABILITIES=all
        - VGL_DISPLAY=egl
        - ENABLEHWGPU=true
        - SUPD_LOGLEVEL=INFO
        - VNC_RESOLUTION=1900x1200

@vajonam
Copy link
Contributor

vajonam commented May 17, 2024

These are important.

        - VGL_DISPLAY=egl
        - ENABLEHWGPU=true

@helfrichmichael
Copy link
Owner

These are important.

        - VGL_DISPLAY=egl
        - ENABLEHWGPU=true

Confirmed VGL_DISPLAY=egl doesn't change the behavior on my end for the nvidia-smi output or the docker container.

Regarding vglserver_config are you saying I need to set this up on the host (not the docker container)?

@vajonam
Copy link
Contributor

vajonam commented May 17, 2024

Yes. you need it on the host to ensure the devices have the right permissions to access to the card. All it does in this case is set up permissions on the cards and make sure the user under which the docker daemon is running can access the cards.

@vajonam
Copy link
Contributor

vajonam commented May 17, 2024

These are important.

        - VGL_DISPLAY=egl
        - ENABLEHWGPU=true

Confirmed VGL_DISPLAY=egl doesn't change the behavior on my end for the nvidia-smi output or the docker container.

Regarding vglserver_config are you saying I need to set this up on the host (not the docker container)?

Assuming you set ENABLEHWGPU to true as well?

@helfrichmichael
Copy link
Owner

Yes. you need it on the host to ensure the devices have the right permissions to access to the card. All it does in this case is set up permissions on the cards and make sure the user under which the docker daemon is running can access the cards.

Hmmm that might add complexity for unraid since I can't find that as a supported approach and I believe it spins up an X server if I'm not mistaken? I'll have to look into this after work.

These are important.

        - VGL_DISPLAY=egl
        - ENABLEHWGPU=true

Confirmed VGL_DISPLAY=egl doesn't change the behavior on my end for the nvidia-smi output or the docker container.
Regarding vglserver_config are you saying I need to set this up on the host (not the docker container)?

Assuming you set ENABLEHWGPU to true as well?

Correct, I have set both of those on my template.

@vajonam
Copy link
Contributor

vajonam commented May 17, 2024

There is no need with an X server on the host, it just uses EGL (virutalGL) to use the card to render it on the VNC based X server.

@helfrichmichael
Copy link
Owner

I got some time just now to play with this a bit more and the solution to my problems wasn't enabling anything further with VirtualGL/vglserver.

In-fact it was just adding --runtime=nvidia as an "Extra Parameters" and it's working flawlessly now.

Amazing work @vajonam!

image
image

@helfrichmichael
Copy link
Owner

Feel free to re-open this if anyone is experiencing issues, but I believe this is good to go 🥳 .

@vajonam
Copy link
Contributor

vajonam commented Jul 9, 2024 via email

@t3chguy
Copy link

t3chguy commented Jul 9, 2024

@vajonam PrusaSlicer no longer offer a tarball build, only AppImage, so it'd require a significant rework of projects such as this one.

@vajonam
Copy link
Contributor

vajonam commented Jul 9, 2024

Well that is a pain! Just when we got this figured out.. okay I will do some research on how to use the appimage inside the container.

@helfrichmichael
Copy link
Owner

Well that is a pain! Just when we got this figured out.. okay I will do some research on how to use the appimage inside the container.

I can take a look tonight, I've done this with other containers before using the extracted app. Should have time this evening after work.

@helfrichmichael
Copy link
Owner

@vajonam PrusaSlicer no longer offer a tarball build, only AppImage, so it'd require a significant rework of projects such as this one.

Fix is in on 2ca3059. Pushing this to Docker now :).

@vajonam
Copy link
Contributor

vajonam commented Jul 15, 2024

Tried it out, now doesn't launch anything just shows the blue desktop background in VNC. Never mind killed and re-brought up the container. seems to work. will do some testing

@helfrichmichael
Copy link
Owner

Tried it out, now doesn't launch anything just shows the blue desktop background in VNC. Never mind killed and re-brought up the container. seems to work. will do some testing

Glad to hear it's working now! There's a few nuances I'm trying to sort with a few windows being blank, but I'm looking into that in #17.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants