Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow loading unquantized models and BNB dtype selection #21

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

blepping
Copy link

mostly just putting this here for example/discussion purposes.

these changes seem to work but i really don't know if the implementation is correct. this pull:

  • makes loading unquantized checkpoints work (tested with SD15)
  • adds a CheckpointLoaderBNB node that lets you select BNB fp4 or nf4 quantization (the default option works like before, so it will fail for un-prequantized models)
  • fixes else case that referred to unknown symbols - now you'll get a runtime error if execution ever reaches that point. not sure if it's possible

i also looked at implementing BNB's int8 but it seems like the matmul for int8 doesn't support the operations Comfy wants to use.

@bananasss00
Copy link

It works with SDXL models, but when I add LoRA, I get the following error:

TypeError: All input tensors need to be on the same GPU, but found some tensors to not be on a GPU:
[(torch.Size([1, 204800]), device(type='cuda', index=0)), (torch.Size([6400]), device(type='cpu')), (torch.Size([320, 1280]), device(type='cuda', index=0))]

Can you fix this?

and sometimes, with lora too:

File "V:\comfyu_py311\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "V:\comfyu_py311\python_embeded\Lib\site-packages\torch\nn\modules\container.py", line 217, in forward
input = module(input)
^^^^^^^^^^^^^
File "V:\comfyu_py311\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
return self.call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "V:\comfyu_py311\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1541, in call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "V:\comfyu_py311\ComfyUI\custom_nodes\ComfyUI_bitsandbytes_NF4-fork_init
.py", line 171, in forward
return functional_linear_4bits(x, self.weight, self.bias)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "V:\comfyu_py311\ComfyUI\custom_nodes\ComfyUI_bitsandbytes_NF4-fork_init
.py", line 12, in functional_linear_4bits
out = bnb.matmul_4bit(x, weight.t(), bias=bias, quant_state=weight.quant_state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "V:\comfyu_py311\python_embeded\Lib\site-packages\bitsandbytes\autograd_functions.py", line 566, in matmul_4bit
assert quant_state is not None
^^^^^^^^^^^^^^^^^^^^^^^
AssertionError

@blepping
Copy link
Author

@bananasss00

It works with SDXL models, but when I add LoRA, I get the following error:

even if you got past that point it still wouldn't work. i actually spent some time trying to figure out how to get LoRAs working but wasn't successful. The Comfy LoRA loader logic is pretty complicated, it ends up trying to add the non-quantized LoRAs to the quantized model or something like that.

however you can use the CLIP part of LoRAs without an issue if you also load the model normally for the prompt encoding part. i also don't recommend using the CLIP output from the BNB nodes: there's really no reason to use quantized CLIP (at least with SDXL and SD1x). in other words: use a normal model loader and it's CLIP output to encode the prompt to CONDITIONING - you can apply whatever LoRAs you want, but only the CLIP part will have an effect. then you can use the MODEL output from the BNB loader node for sampling. hope that makes sense.

@bananasss00
Copy link

@blepping I'm encountering an error when running the comfyui with fp8 parameters.

Traceback
Blockwise quantization only supports 16/32-bit floats, but got torch.float8_e4m3fn
Traceback (most recent call last):
  File "V:\comfyu_py311\ComfyUI\execution.py", line 152, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "V:\comfyu_py311\ComfyUI\execution.py", line 82, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "V:\comfyu_py311\ComfyUI\execution.py", line 75, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "V:\comfyu_py311\ComfyUI\nodes.py", line 1382, in sample
    return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "V:\comfyu_py311\ComfyUI\nodes.py", line 1352, in common_ksampler
    samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "V:\comfyu_py311\ComfyUI\comfy\sample.py", line 43, in sample
    samples = sampler.sample(noise, positive, negative, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "V:\comfyu_py311\ComfyUI\comfy\samplers.py", line 829, in sample
    return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "V:\comfyu_py311\ComfyUI\comfy\samplers.py", line 729, in sample
    return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "V:\comfyu_py311\ComfyUI\comfy\samplers.py", line 706, in sample
    self.inner_model, self.conds, self.loaded_models = comfy.sampler_helpers.prepare_sampling(self.model_patcher, noise.shape, self.conds)
                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "V:\comfyu_py311\ComfyUI\comfy\sampler_helpers.py", line 66, in prepare_sampling
    comfy.model_management.load_models_gpu([model] + models, memory_required=memory_required, minimum_memory_required=minimum_memory_required)
  File "V:\comfyu_py311\ComfyUI\comfy\model_management.py", line 527, in load_models_gpu
    cur_loaded_model = loaded_model.model_load(lowvram_model_memory, force_patch_weights=force_patch_weights)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "V:\comfyu_py311\ComfyUI\comfy\model_management.py", line 325, in model_load
    raise e
  File "V:\comfyu_py311\ComfyUI\comfy\model_management.py", line 321, in model_load
    self.real_model = self.model.patch_model(device_to=patch_model_to, patch_weights=load_weights)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "V:\comfyu_py311\ComfyUI\comfy\model_patcher.py", line 352, in patch_model
    self.model.to(device_to)
  File "V:\comfyu_py311\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1174, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "V:\comfyu_py311\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 780, in _apply
    module._apply(fn)
  File "V:\comfyu_py311\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 780, in _apply
    module._apply(fn)
  File "V:\comfyu_py311\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 780, in _apply
    module._apply(fn)
  File "V:\comfyu_py311\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 805, in _apply
    param_applied = fn(param)
                    ^^^^^^^^^
  File "V:\comfyu_py311\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1160, in convert
    return t.to(
           ^^^^^
  File "V:\comfyu_py311\ComfyUI\custom_nodes\ComfyUI_bitsandbytes_NF4-fork\__init__.py", line 77, in to
    return self._quantize(device)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "V:\comfyu_py311\python_embeded\Lib\site-packages\bitsandbytes\nn\modules.py", line 297, in _quantize
    w_4bit, quant_state = bnb.functional.quantize_4bit(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "V:\comfyu_py311\python_embeded\Lib\site-packages\bitsandbytes\functional.py", line 1238, in quantize_4bit
    raise ValueError(f"Blockwise quantization only supports 16/32-bit floats, but got {A.dtype}")
ValueError: Blockwise quantization only supports 16/32-bit floats, but got torch.float8_e4m3fn

I fixed it by https://gist.github.com/bananasss00/bdfc67da5e1642e4e94796574d5826a0#file-__init__-py-L77

but the model loads slowly the first time. Is there a better fix for fp8 mode?

@blepping
Copy link
Author

I'm encountering an error when running the comfyui with fp8 parameters.

i don't think doing that makes any sense. you need to pick one type of quantization.

likely what you're doing is loading the model, then casting it down to 8 bit (which is a lossy process), then converting it back up to 16 bit then quantizing it down to 4 bit. it's converting the model multiple times which naturally is going to be slow. the information that got thrown away converting it to 8 bit cannot be recovered, even if you cast it back to 16 bit later on.

it's basically like if you had a PNG file, then you JPEG compressed it with medium compression, then you converted it back to PNG then you JPEG compressed it with high compression - it's going to be both slower and worse quality than if just took the original high quality source and used JPEG high compression on it.

@bananasss00
Copy link

@blepping I have already compressed all my SD15 and SDXL models to FP8 for space efficiency, so the quality will not degrade further in my case.

ComfyUI is running continuously in FP8 mode. Therefore, when there is a need to load Flux or another model in NF4/FP4, there should be an option to simply add a node without having to restart ComfyUI.

@blepping
Copy link
Author

blepping commented Aug 13, 2024

I have already compressed all my SD15 and SDXL models to FP8 for space efficiency, so the quality will not degrade further in my case.

you haven't avoided the issue i mentioned, you just did the first step of it in the past. by converting a FP8 model to NF4 or whatever, you are getting hit with quality loss multiple times instead of just once.

ComfyUI is running continuously in FP8 mode.

like i said, i don't recommend doing that if you're going to be using the BNB quantization too.

Therefore, when there is a need to load Flux or another model in NF4/FP4, there should be an option to simply add a node without having to restart ComfyUI.

that would be ideal, but sadly it's beyond my capabilities. i don't even know how this stuff interacts with Comfy's internal casting/quantization if you have FP8 mode enabled. you could try creating an issue here on maybe in the main Comfy repo asking for that feature, not sure how good your chances are of it actually being implemented are.

@bananasss00
Copy link

https://gist.github.com/bananasss00/301d666f20370412a571570ec304527c

changes, mb you want add too:
image

@blepping
Copy link
Author

changes, mb you want add too

looks useful, though i'd probably take a slightly different approach implementing it. i don't think there's much chance of this pull getting merged though, so probably the only way to make it available would to be to publish it as a different node.

@nux1111
Copy link

nux1111 commented Sep 8, 2024

@bananasss00 any idea how to solve this? I get it when i try to load a lora:
File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\bitsandbytes\functional.py", line 432, in is_on_gpu
raise TypeError(
TypeError: All input tensors need to be on the same GPU, but found some tensors to not be on a GPU:
[(torch.Size([28311552, 1]), device(type='cuda', index=0)), (torch.Size([1, 3072]), device(type='cuda', index=0)), (torch.Size([1, 18432]), device(type='cuda', index=0)), (torch.Size([884736]), device(type='cpu')), (torch.Size([16]), device(type='cpu'))]

@blepping
Copy link
Author

blepping commented Sep 8, 2024

any idea how to solve this? I get it when i try to load a lora

you can't use LoRAs with NF4. i recommend switching to GGUF if possible (which does allow using LoRAs): https://github.com/city96/ComfyUI-GGUF

@nux1111
Copy link

nux1111 commented Sep 8, 2024

@blepping Thanks for the response. I thought Forge had solve this and that the code @bananasss00 posted does that too.

@blepping
Copy link
Author

blepping commented Sep 8, 2024

Thanks for the response. I thought Forge had solve this and that the code bananasss00 posted does that too.

think bananasss' changes just added some improvements to the node (like letting you choose what parts to load), it didn't add core functionality.

there's nothing that makes NF4 and LoRAs inherently impossible so Forge may have implemented it. that won't help you here though unless someone wants to try to port the changes. not too much enthusiasm for that since it's basically been abandoned/deprecated here.

BTW, i contributed changes to ComfyUI-GGUF to support SD1x and SDXL models as well as stable-diffusion.cpp quantized. so even though the documentation says "don't use with SD1/SDXL", it's out of date. just mentioning that in case that's what was holding you back from switch.

@nux1111
Copy link

nux1111 commented Sep 8, 2024

@blepping Huge! thanks for your contributions, I will check sd1/sdxl quants, didn't know it was possible.

@nux1111
Copy link

nux1111 commented Sep 8, 2024

you can post some sd1/sdxl gguf model download links if you have one.

@blepping
Copy link
Author

blepping commented Sep 8, 2024

you can post some sd1/sdxl gguf model download links if you have one.

sorry, i don't have any. it's a pretty recent thing. you'd need to convert them yourself. the easiest way is probably with https://github.com/leejet/stable-diffusion.cpp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants