-
Notifications
You must be signed in to change notification settings - Fork 693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support for nvidia gpu access #5132
Conversation
Hi @MondoGao thanks for your PR! Can you please fix DCO? Wee need that in order to be able to merge it later on. See https://github.com/nextcloud/all-in-one/pull/5132/checks?check_run_id=28844781358 |
Sign added :) |
Hi, I had a fast look at this and I think it would add the capability to all containers that are controlled by AIO. Better would probably be to add this as a capability to containers-schema.json and add this only to certain containers via containers.json Also this is missing some places, e.g. adding documentation on it in the readme. See #1659 as inspiration. Additionally, do you know if this is also going to work with AMD and Intel GPUs? In best case we create only one setting that works for all of them. |
I believe docker runtime only supports Nvidia GPU passthrough. Intel & AMD CPU/GPU's hardware acceleration is exposed through /dev/dri. Looks like I have to install php dev env to polish this pr, please expect a late response. |
@MondoGao Thank you for starting this effort. Besides supporting Nvidia GPU passthrough, there are cases where the container may require the same specific version of the NVIDIA driver installed on the host system. It might be a good idea to create an environment variable for the Nvidia driver version and let the container handle the Nvidia driver setup and configuration logic inside the container. That is, Nextcloud AIO could either provide some way for the user to manually specify the version or, if it is not provided by the user, then have an automatic way to obtain this information and fill an I have exactly this with docker-steam-headless, and it works great. This is the driver download and installation script: https://github.com/Steam-Headless/docker-steam-headless/blob/860451da74b397385f1b1658545d2bb891aa8e46/overlay/etc/cont-init.d/60-configure_gpu_driver.sh This script creates the X Server configuration files and other related configurations. It probably does not apply to Nextcloud and plugins since they do not use X Server, but I am including it here just for reference. |
It would be nice to have this feature! We can have Nextcloud Assistant servers with high performance easily!!!!!! I agree with @szaimen the configuration should be done in the Otherwise it looks good to me. AMD GPU : https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/docker.html |
Just to be sure to have understood this topic correctly: actually there's no way to use a dedicated GPU in Nextcloud AIO, since it is needed a development to make sure it can see (and then use) the graphic card. Is it correct? Aren't some workaround to be able to use the GPU in the meantime? We are happy to help to test if it's necessary. Thanks |
Yes
Currently no as this PR is unfortunately not even close to being finished. You could try to find someone that takes over finishing this PR to speed up the development. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any news?
What's blocking the merger?
See #5132 (comment) |
Also, twig CI seems to fail... |
I try to look this |
Fix |
Done |
Signed-off-by: Simon L. <[email protected]>
… local-ai Signed-off-by: Simon L. <[email protected]>
Signed-off-by: Simon L. <[email protected]>
Signed-off-by: Simon L. <[email protected]>
Signed-off-by: Simon L. <[email protected]>
Signed-off-by: Simon L. <[email protected]>
Signed-off-by: Simon L. <[email protected]>
Signed-off-by: Simon L. <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🎉
Thank you @MondoGao |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now released with v10.2.0 Beta. Testing and feedback is welcome! See https://github.com/nextcloud/all-in-one#how-to-switch-the-channel |
@szaimen the This is the logs that I have in the AIO container: Click me to see the logs
Initial startup of Nextcloud All-in-One complete!
You should be able to open the Nextcloud AIO Interface now on port 8080 of this server!
E.g. https://internal.ip.of.this.server:8080
⚠️ Important: do always use an ip-address if you access this port and not a domain as HSTS might block access to it later!
If your server has port 80 and 8443 open and you point a domain to your server, you can get a valid certificate automatically by opening the Nextcloud AIO Interface via:
https://your-domain-that-points-to-this-server.tld:8443
NOTICE: PHP message: Slim Application Error
Type: Exception
Code: 0
Message: Could not start container nextcloud-aio-local-ai: Server error: `POST http://127.0.0.1/v1.41/containers/nextcloud-aio-local-ai/start` resulted in a `500 Internal Server Error` response:
{"message":"failed to create task for container: failed to create shim task: OCI runtime create failed: runc create fail (truncated...)
File: /var/www/docker-aio/php/src/Docker/DockerActionManager.php
Line: 170
Trace: #0 /var/www/docker-aio/php/src/Controller/DockerController.php(59): AIO\Docker\DockerActionManager->StartContainer(Object(AIO\Container\Container))
#1 /var/www/docker-aio/php/src/Controller/DockerController.php(26): AIO\Controller\DockerController->PerformRecursiveContainerStart('nextcloud-aio-l...', true)
#2 /var/www/docker-aio/php/src/Controller/DockerController.php(209): AIO\Controller\DockerController->PerformRecursiveContainerStart('nextcloud-aio-a...', true)
#3 /var/www/docker-aio/php/src/Controller/DockerController.php(189): AIO\Controller\DockerController->startTopContainer(true)
#4 /var/www/docker-aio/php/vendor/slim/slim/Slim/Handlers/Strategies/RequestResponse.php(38): AIO\Controller\DockerController->StartContainer(Object(GuzzleHttp\Psr7\ServerRequest), Object(GuzzleHttp\Psr7\Response), Array)
#5 /var/www/docker-aio/php/vendor/slim/slim/Slim/Routing/Route.php(363): Slim\Handlers\Strategies\RequestResponse->__invoke(Array, Object(GuzzleHttp\Psr7\ServerRequest), Object(GuzzleHttp\Psr7\Response), Array)
#6 /var/www/docker-aio/php/vendor/slim/slim/Slim/MiddlewareDispatcher.php(73): Slim\Routing\Route->handle(Object(GuzzleHttp\Psr7\ServerRequest))
#7 /var/www/docker-aio/php/vendor/slim/slim/Slim/MiddlewareDispatcher.php(73): Slim\MiddlewareDispatcher->handle(Object(GuzzleHttp\Psr7\ServerRequest))
#8 /var/www/docker-aio/php/vendor/slim/slim/Slim/Routing/Route.php(321): Slim\MiddlewareDispatcher->handle(Object(GuzzleHttp\Psr7\ServerRequest))
#9 /var/www/docker-aio/php/vendor/slim/slim/Slim/Routing/RouteRunner.php(74): Slim\Routing\Route->run(Object(GuzzleHttp\Psr7\ServerRequest))
#10 /var/www/docker-aio/php/vendor/slim/csrf/src/Guard.php(482): Slim\Routing\RouteRunner->handle(Object(GuzzleHttp\Psr7\ServerRequest))
#11 /var/www/docker-aio/php/vendor/slim/slim/Slim/MiddlewareDispatcher.php(177): Slim\Csrf\Guard->process(Object(GuzzleHttp\Psr7\ServerRequest), Object(Slim\Routing\RouteRunner))
#12 /var/www/docker-aio/php/vendor/slim/twig-view/src/TwigMiddleware.php(117): Psr\Http\Server\RequestHandlerInterface@anonymous->handle(Object(GuzzleHttp\Psr7\ServerRequest))
#13 /var/www/docker-aio/php/vendor/slim/slim/Slim/MiddlewareDispatcher.php(129): Slim\Views\TwigMiddleware->process(Object(GuzzleHttp\Psr7\ServerRequest), Object(Psr\Http\Server\RequestHandlerInterface@anonymous))
#14 /var/www/docker-aio/php/src/Middleware/AuthMiddleware.php(36): Psr\Http\Server\RequestHandlerInterface@anonymous->handle(Object(GuzzleHttp\Psr7\ServerRequest))
#15 /var/www/docker-aio/php/vendor/slim/slim/Slim/MiddlewareDispatcher.php(280): AIO\Middleware\AuthMiddleware->__invoke(Object(GuzzleHttp\Psr7\ServerRequest), Object(Psr\Http\Server\RequestHandlerInterface@anonymous))
#16 /var/www/docker-aio/php/vendor/slim/slim/Slim/Middleware/ErrorMiddleware.php(77): Psr\Http\Server\RequestHandlerInterface@anonymous->handle(Object(GuzzleHttp\Psr7\ServerRequest))
#17 /var/www/docker-aio/php/vendor/slim/slim/Slim/MiddlewareDispatcher.php(129): Slim\Middleware\ErrorMiddleware->process(Object(GuzzleHttp\Psr7\ServerRequest), Object(Psr\Http\Server\RequestHandlerInterface@anonymous))
#18 /var/www/docker-aio/php/vendor/slim/slim/Slim/MiddlewareDispatcher.php(73): Psr\Http\Server\RequestHandlerInterface@anonymous->handle(Object(GuzzleHttp\Psr7\ServerRequest))
#19 /var/www/docker-aio/php/vendor/slim/slim/Slim/App.php(209): Slim\MiddlewareDispatcher->handle(Object(GuzzleHttp\Psr7\ServerRequest))
#20 /var/www/docker-aio/php/vendor/slim/slim/Slim/App.php(193): Slim\App->handle(Object(GuzzleHttp\Psr7\ServerRequest))
#21 /var/www/docker-aio/php/public/index.php(189): Slim\App->run()
#22 {main}
Tips: To display error details in HTTP response set "displayErrorDetails" to true in the ErrorHandler constructor.
NOTICE: PHP message: Slim Application Error
Type: Exception
Code: 0
Message: Could not start container nextcloud-aio-local-ai: Server error: `POST http://127.0.0.1/v1.41/containers/nextcloud-aio-local-ai/start` resulted in a `500 Internal Server Error` response:
{"message":"failed to create task for container: failed to create shim task: OCI runtime create failed: runc create fail (truncated...)
File: /var/www/docker-aio/php/src/Docker/DockerActionManager.php
Line: 170
Trace: #0 /var/www/docker-aio/php/src/Controller/DockerController.php(59): AIO\Docker\DockerActionManager->StartContainer(Object(AIO\Container\Container))
#1 /var/www/docker-aio/php/src/Controller/DockerController.php(26): AIO\Controller\DockerController->PerformRecursiveContainerStart('nextcloud-aio-l...', true)
#2 /var/www/docker-aio/php/src/Controller/DockerController.php(209): AIO\Controller\DockerController->PerformRecursiveContainerStart('nextcloud-aio-a...', true)
#3 /var/www/docker-aio/php/src/Controller/DockerController.php(189): AIO\Controller\DockerController->startTopContainer(true)
#4 /var/www/docker-aio/php/vendor/slim/slim/Slim/Handlers/Strategies/RequestResponse.php(38): AIO\Controller\DockerController->StartContainer(Object(GuzzleHttp\Psr7\ServerRequest), Object(GuzzleHttp\Psr7\Response), Array)
#5 /var/www/docker-aio/php/vendor/slim/slim/Slim/Routing/Route.php(363): Slim\Handlers\Strategies\RequestResponse->__invoke(Array, Object(GuzzleHttp\Psr7\ServerRequest), Object(GuzzleHttp\Psr7\Response), Array)
#6 /var/www/docker-aio/php/vendor/slim/slim/Slim/MiddlewareDispatcher.php(73): Slim\Routing\Route->handle(Object(GuzzleHttp\Psr7\ServerRequest))
#7 /var/www/docker-aio/php/vendor/slim/slim/Slim/MiddlewareDispatcher.php(73): Slim\MiddlewareDispatcher->handle(Object(GuzzleHttp\Psr7\ServerRequest))
#8 /var/www/docker-aio/php/vendor/slim/slim/Slim/Routing/Route.php(321): Slim\MiddlewareDispatcher->handle(Object(GuzzleHttp\Psr7\ServerRequest))
#9 /var/www/docker-aio/php/vendor/slim/slim/Slim/Routing/RouteRunner.php(74): Slim\Routing\Route->run(Object(GuzzleHttp\Psr7\ServerRequest))
#10 /var/www/docker-aio/php/vendor/slim/csrf/src/Guard.php(482): Slim\Routing\RouteRunner->handle(Object(GuzzleHttp\Psr7\ServerRequest))
#11 /var/www/docker-aio/php/vendor/slim/slim/Slim/MiddlewareDispatcher.php(177): Slim\Csrf\Guard->process(Object(GuzzleHttp\Psr7\ServerRequest), Object(Slim\Routing\RouteRunner))
#12 /var/www/docker-aio/php/vendor/slim/twig-view/src/TwigMiddleware.php(117): Psr\Http\Server\RequestHandlerInterface@anonymous->handle(Object(GuzzleHttp\Psr7\ServerRequest))
#13 /var/www/docker-aio/php/vendor/slim/slim/Slim/MiddlewareDispatcher.php(129): Slim\Views\TwigMiddleware->process(Object(GuzzleHttp\Psr7\ServerRequest), Object(Psr\Http\Server\RequestHandlerInterface@anonymous))
#14 /var/www/docker-aio/php/src/Middleware/AuthMiddleware.php(36): Psr\Http\Server\RequestHandlerInterface@anonymous->handle(Object(GuzzleHttp\Psr7\ServerRequest))
#15 /var/www/docker-aio/php/vendor/slim/slim/Slim/MiddlewareDispatcher.php(280): AIO\Middleware\AuthMiddleware->__invoke(Object(GuzzleHttp\Psr7\ServerRequest), Object(Psr\Http\Server\RequestHandlerInterface@anonymous))
#16 /var/www/docker-aio/php/vendor/slim/slim/Slim/Middleware/ErrorMiddleware.php(77): Psr\Http\Server\RequestHandlerInterface@anonymous->handle(Object(GuzzleHttp\Psr7\ServerRequest))
#17 /var/www/docker-aio/php/vendor/slim/slim/Slim/MiddlewareDispatcher.php(129): Slim\Middleware\ErrorMiddleware->process(Object(GuzzleHttp\Psr7\ServerRequest), Object(Psr\Http\Server\RequestHandlerInterface@anonymous))
#18 /var/www/docker-aio/php/vendor/slim/slim/Slim/MiddlewareDispatcher.php(73): Psr\Http\Server\RequestHandlerInterface@anonymous->handle(Object(GuzzleHttp\Psr7\ServerRequest))
#19 /var/www/docker-aio/php/vendor/slim/slim/Slim/App.php(209): Slim\MiddlewareDispatcher->handle(Object(GuzzleHttp\Psr7\ServerRequest))
#20 /var/www/docker-aio/php/vendor/slim/slim/Slim/App.php(193): Slim\App->handle(Object(GuzzleHttp\Psr7\ServerRequest))
#21 /var/www/docker-aio/php/public/index.php(189): Slim\App->run()
#22 {main}
Tips: To display error details in HTTP response set "displayErrorDetails" to true in the ErrorHandler constructor. I think the reason is probably because the docker image used as a base for aio-local-ai is quay.io/go-skynet/local-ai:v2.24.2-aio-cpu , but when
Additionally, I saw that there are options for Intel GPU or something like this
Anyway, the main problem about the error is because it stop the all other containers to start. |
Hm... This specific error should not happen, it should still be able to start the container... Did you install the nvidia drivers correctly like mentioned im the readme? |
Yes, since I can successfully run the Sample Workload with Docker
Note: I'm using Unraid OS 6.12.3 and have the Nvidia Driver plugin by ich777 and the Nvidia proprietary driver v565.57.01 , and I am able to play games on Steam via docker container https://github.com/Steam-Headless/docker-steam-headless Anyway, the main problem about the error on the local-ai community container is because it somehow block the all other nextcloud containers to start by Nextcloud AIO. |
So if you remove local-ai from the community containers, nextcloud starts correctly? |
Yes, and also if I keep local-ai on the community containers and set ENABLE_NVIDIA_GPU=false the nextcloud also starts correctly |
Okay this is weird. I guess we need to remove the setting for local-ai then. Doe the other containers that we added start correctly? |
Until all-in-one/community-containers/local-ai/local-ai.json Lines 7 to 8 in 109b9dc
ENABLE_NVIDIA_GPU=true , probably yes
Since I don't use I'm also not sure if I'll try to figure that out over the weekend. |
A quick and simple solution is probably to create a new community container named That is, if I want to use |
facerecognition and memories don't seem to see the gpu immediately. I tried facerecognition is running through 70k photos, and I don't see any gpu activity in nvtop. Tried resetting facerecognition and restarting for all configurations. I currently use the external go-vod container for memories. Disabling this but leaving transcoding on didn't work. My nvidia runtime is configured correctly. I use seperate containers for ai and jellyfin, and they both use the nvidia runtime with no issues. I'm not sure where to look for logs, but I am happy to help if given some direction. |
This is now fixed with v10.3.0 Beta. Testing and feedback is welcome! See https://github.com/nextcloud/all-in-one#how-to-switch-the-channel |
Close #4277
Add a new environment to enable Nvidia gpu access when creating containers. It's similar to /dev/dri support. See #1525.
Note: I didn't fully inspect and test this change since I don't have php development environment set up in my machine. Feel free to inherit this pr to enhance its maintainability.