-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow loading unquantized models and BNB dtype selection #21
base: master
Are you sure you want to change the base?
Conversation
Add CheckpointLoaderBNB node that allows selecting the BNB dtype
It works with SDXL models, but when I add LoRA, I get the following error:
Can you fix this? and sometimes, with lora too:
|
even if you got past that point it still wouldn't work. i actually spent some time trying to figure out how to get LoRAs working but wasn't successful. The Comfy LoRA loader logic is pretty complicated, it ends up trying to add the non-quantized LoRAs to the quantized model or something like that. however you can use the CLIP part of LoRAs without an issue if you also load the model normally for the prompt encoding part. i also don't recommend using the |
@blepping I'm encountering an error when running the comfyui with fp8 parameters. TracebackBlockwise quantization only supports 16/32-bit floats, but got torch.float8_e4m3fn
Traceback (most recent call last):
File "V:\comfyu_py311\ComfyUI\execution.py", line 152, in recursive_execute
output_data, output_ui = get_output_data(obj, input_data_all)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "V:\comfyu_py311\ComfyUI\execution.py", line 82, in get_output_data
return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "V:\comfyu_py311\ComfyUI\execution.py", line 75, in map_node_over_list
results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "V:\comfyu_py311\ComfyUI\nodes.py", line 1382, in sample
return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "V:\comfyu_py311\ComfyUI\nodes.py", line 1352, in common_ksampler
samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "V:\comfyu_py311\ComfyUI\comfy\sample.py", line 43, in sample
samples = sampler.sample(noise, positive, negative, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "V:\comfyu_py311\ComfyUI\comfy\samplers.py", line 829, in sample
return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "V:\comfyu_py311\ComfyUI\comfy\samplers.py", line 729, in sample
return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "V:\comfyu_py311\ComfyUI\comfy\samplers.py", line 706, in sample
self.inner_model, self.conds, self.loaded_models = comfy.sampler_helpers.prepare_sampling(self.model_patcher, noise.shape, self.conds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "V:\comfyu_py311\ComfyUI\comfy\sampler_helpers.py", line 66, in prepare_sampling
comfy.model_management.load_models_gpu([model] + models, memory_required=memory_required, minimum_memory_required=minimum_memory_required)
File "V:\comfyu_py311\ComfyUI\comfy\model_management.py", line 527, in load_models_gpu
cur_loaded_model = loaded_model.model_load(lowvram_model_memory, force_patch_weights=force_patch_weights)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "V:\comfyu_py311\ComfyUI\comfy\model_management.py", line 325, in model_load
raise e
File "V:\comfyu_py311\ComfyUI\comfy\model_management.py", line 321, in model_load
self.real_model = self.model.patch_model(device_to=patch_model_to, patch_weights=load_weights)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "V:\comfyu_py311\ComfyUI\comfy\model_patcher.py", line 352, in patch_model
self.model.to(device_to)
File "V:\comfyu_py311\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1174, in to
return self._apply(convert)
^^^^^^^^^^^^^^^^^^^^
File "V:\comfyu_py311\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 780, in _apply
module._apply(fn)
File "V:\comfyu_py311\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 780, in _apply
module._apply(fn)
File "V:\comfyu_py311\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 780, in _apply
module._apply(fn)
File "V:\comfyu_py311\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 805, in _apply
param_applied = fn(param)
^^^^^^^^^
File "V:\comfyu_py311\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1160, in convert
return t.to(
^^^^^
File "V:\comfyu_py311\ComfyUI\custom_nodes\ComfyUI_bitsandbytes_NF4-fork\__init__.py", line 77, in to
return self._quantize(device)
^^^^^^^^^^^^^^^^^^^^^^
File "V:\comfyu_py311\python_embeded\Lib\site-packages\bitsandbytes\nn\modules.py", line 297, in _quantize
w_4bit, quant_state = bnb.functional.quantize_4bit(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "V:\comfyu_py311\python_embeded\Lib\site-packages\bitsandbytes\functional.py", line 1238, in quantize_4bit
raise ValueError(f"Blockwise quantization only supports 16/32-bit floats, but got {A.dtype}")
ValueError: Blockwise quantization only supports 16/32-bit floats, but got torch.float8_e4m3fn I fixed it by https://gist.github.com/bananasss00/bdfc67da5e1642e4e94796574d5826a0#file-__init__-py-L77 but the model loads slowly the first time. Is there a better fix for fp8 mode? |
i don't think doing that makes any sense. you need to pick one type of quantization. likely what you're doing is loading the model, then casting it down to 8 bit (which is a lossy process), then converting it back up to 16 bit then quantizing it down to 4 bit. it's converting the model multiple times which naturally is going to be slow. the information that got thrown away converting it to 8 bit cannot be recovered, even if you cast it back to 16 bit later on. it's basically like if you had a PNG file, then you JPEG compressed it with medium compression, then you converted it back to PNG then you JPEG compressed it with high compression - it's going to be both slower and worse quality than if just took the original high quality source and used JPEG high compression on it. |
@blepping I have already compressed all my SD15 and SDXL models to FP8 for space efficiency, so the quality will not degrade further in my case. ComfyUI is running continuously in FP8 mode. Therefore, when there is a need to load Flux or another model in NF4/FP4, there should be an option to simply add a node without having to restart ComfyUI. |
you haven't avoided the issue i mentioned, you just did the first step of it in the past. by converting a FP8 model to NF4 or whatever, you are getting hit with quality loss multiple times instead of just once.
like i said, i don't recommend doing that if you're going to be using the BNB quantization too.
that would be ideal, but sadly it's beyond my capabilities. i don't even know how this stuff interacts with Comfy's internal casting/quantization if you have FP8 mode enabled. you could try creating an issue here on maybe in the main Comfy repo asking for that feature, not sure how good your chances are of it actually being implemented are. |
looks useful, though i'd probably take a slightly different approach implementing it. i don't think there's much chance of this pull getting merged though, so probably the only way to make it available would to be to publish it as a different node. |
@bananasss00 any idea how to solve this? I get it when i try to load a lora: |
you can't use LoRAs with NF4. i recommend switching to GGUF if possible (which does allow using LoRAs): https://github.com/city96/ComfyUI-GGUF |
@blepping Thanks for the response. I thought Forge had solve this and that the code @bananasss00 posted does that too. |
think bananasss' changes just added some improvements to the node (like letting you choose what parts to load), it didn't add core functionality. there's nothing that makes NF4 and LoRAs inherently impossible so Forge may have implemented it. that won't help you here though unless someone wants to try to port the changes. not too much enthusiasm for that since it's basically been abandoned/deprecated here. BTW, i contributed changes to ComfyUI-GGUF to support SD1x and SDXL models as well as stable-diffusion.cpp quantized. so even though the documentation says "don't use with SD1/SDXL", it's out of date. just mentioning that in case that's what was holding you back from switch. |
@blepping Huge! thanks for your contributions, I will check sd1/sdxl quants, didn't know it was possible. |
you can post some sd1/sdxl gguf model download links if you have one. |
sorry, i don't have any. it's a pretty recent thing. you'd need to convert them yourself. the easiest way is probably with https://github.com/leejet/stable-diffusion.cpp |
mostly just putting this here for example/discussion purposes.
these changes seem to work but i really don't know if the implementation is correct. this pull:
CheckpointLoaderBNB
node that lets you select BNBfp4
ornf4
quantization (thedefault
option works like before, so it will fail for un-prequantized models)i also looked at implementing BNB's
int8
but it seems like the matmul for int8 doesn't support the operations Comfy wants to use.