-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: refine loRA diffusers to flux conversion logic #7708
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for digging into this!
To get it merged, we'll need to:
- Fix the shift/scale transformation.
- Add a unit test in
test_flux_diffusers_lora_conversion_utils.py
for this new LoRA format. See the other tests in that file for reference.
for _key in values.keys(): | ||
# in SD3 original implementation of AdaLayerNormContinuous, it split linear projection output into shift, scale; | ||
# while in diffusers it split into scale, shift. Here we swap the linear projection weights in order to be able to use diffusers implementation | ||
scale, shift = values[_key].chunk(2, dim=0) | ||
values[_key] = torch.cat([shift, scale], dim=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look right to me. If I'm understanding correctly, in the case of a vanilla LoRA layer, we should only be flipping one of the LoRA components.
The required transformation would be a bit more involved for other LoRA variants (LoHA, LoKR, etc.), so I'm fine with only supporting vanilla LoRAs. But, we should assert that the result of any_lora_layer_from_state_dict()
is a LoRALayer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @RyanJDick thanks for spending time 👯
I have to confess, it is more complex than I expected, sorry for not asking the team before hand.
As my understanding
# for normal LoRA layer
delta_W = up @ down
W = W + delta_W
# for AdaLN in diffusers
W_prime = swap_shift_scale(W)
delta_W_prime = swap_shift_scale(delta_W)
# => We may need to add a custom LoRA layer to swap them in `get_weight`
class AdaLN_LoRALayer(LoRALayer):
def get_weight(self, orig_weight: torch.Tensor) -> torch.Tensor:
'''swap shift and scale before returning real weight'''
weight = super().get_weight(orig_weight)
scale, shift = weight.chunk(2, dim=0)
return torch.cat([shift, scale], dim=0)
# we need to build and return this layer in our function
What do you think?
Summary
This PR updates loRA Diffusers -> Flux conversion logic based on its original source:
guidance_in
layer keys, as in: https://github.com/huggingface/diffusers/blob/55ac421f7bb12fd00ccbef727be4dc2f3f920abb/scripts/convert_flux_to_diffusers.py#L103-L115norm_out
with shift scale swapping, as in : https://github.com/huggingface/diffusers/blob/55ac421f7bb12fd00ccbef727be4dc2f3f920abb/scripts/convert_flux_to_diffusers.py#L263-L268Related Issues / Discussions
I couldn't load
Hyper-FLUX.1-dev-Nsteps-lora.safetensors
from https://huggingface.co/ByteDance/Hyper-SD via InvokeUI.QA Instructions
ByteDance/Hyper-SD
in the search box.Merge Plan
Apply the change only, it's a small one
Checklist
What's New
copy (if doing a release after this PR)