You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, running with llama.cpp works well, but keeping the GGUF model resident in limited VRAM is a challenge. With only 16GB of VRAM, I can only run SDXL with a batch size of 1.
I hope to utilize llama-cpp-python to allow comfyUI to manage the VRAM allocation between the LLM and SD models. The envisioned workflow is as follows:
1.Load GGUF Model with llama-cpp-python: Load the model using the Python bindings.
2.Omost Chat: Perform inference and text generation using the loaded model.
3.GGUF Model CPU Offload: Unload the GGUF model from VRAM to CPU memory.
4.Load SD Model: Load the Stable Diffusion model into the now-available VRAM.
This approach should enable running larger models and higher batch sizes in VRAM-constrained environments by leveraging the dynamic loading and unloading capabilities of llama-cpp-python and comfyUI.
The text was updated successfully, but these errors were encountered:
Currently, running with llama.cpp works well, but keeping the GGUF model resident in limited VRAM is a challenge. With only 16GB of VRAM, I can only run SDXL with a batch size of 1.
I hope to utilize llama-cpp-python to allow comfyUI to manage the VRAM allocation between the LLM and SD models. The envisioned workflow is as follows:
1.Load GGUF Model with llama-cpp-python: Load the model using the Python bindings.
2.Omost Chat: Perform inference and text generation using the loaded model.
3.GGUF Model CPU Offload: Unload the GGUF model from VRAM to CPU memory.
4.Load SD Model: Load the Stable Diffusion model into the now-available VRAM.
This approach should enable running larger models and higher batch sizes in VRAM-constrained environments by leveraging the dynamic loading and unloading capabilities of llama-cpp-python and comfyUI.
The text was updated successfully, but these errors were encountered: