-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Takes an excessive amount of time #13
Comments
It appears that the model is loaded in bf16 instead of the 32 that the model was created in. Usually, this means the model is loaded into system ram and then quantized into the lesser depth -> into VRAM. Now obviously, with the model size being so large, this will require a LOT of system ram. I looked it up, and t5_xxl requires a little over 20.5GB VRAM when run in bf16 (Down from the 44GB if you run it in 32). It will also unload the system ram once quantized into VRAM. This isn't my repo and I cannot test this right now as I am captioning music for training so my gpu is in use, but it may be possible to either change the t5 model into xl or another smaller model, load xxl as int8 or int4, OR permanently save a quantized version of xxl at bf16 and use that. Easiest thing to start with: |
i just test it |
This is on the most current pull of the repo: |
OK. So on a 24GB card (3090) it takes roughly 20 seconds to do a single inference currently. Easiest way to test is just copy the config\example.txt file, remove all but one line, then sample with this new file supplied as the prompt. It stays just under the vram limit for 24GB. Interestingly, it takes roughly 2 minutes to generate two samples if you leave two prompts in the new examples.txt file. |
This is taking about 2 hours with the smallest model.
I presume the issue is that my GPU cannot load a t5_XXL model into memory. According to the Huggingface page the model weights are 44.5 Gb.
Is there any possibility of switching it out for a GGUF of the t5_XXL or at least quantizing with bitsandbytes? (Just the encoder)?
The text was updated successfully, but these errors were encountered: