Handle weight aliases #490

jrruethe · 2025-01-21T22:31:56Z

This is a quick modification to the extract_lora.py script to utilize the aliases from the weight info that is loaded from the architecture JSON.

It is my attempt at a solution to #484 and provides an alternative to the hardcoding found in #483.

I have tested this on the following:

Qwen_Qwen2.5-0.5B
Qwen_Qwen2.5-1.5B
Qwen_Qwen2.5-3B
meta-llama_Llama-3.2-1B
meta-llama_Llama-3.2-3B
Skywork/Skywork-Reward-Llama-3.1-8B-v0.2 (This 8B model used "model.embed_tokens.weight" instead of "lm_head.weight", and this change accounts for the difference via the alias)

Tests:

Extraction works, without any "lm_head" errors.
Reapplying the extracted lora works.

Once I complete the above tests, I will clear the "draft".

Notes:
I still get messages stating the following:

Some weights of Qwen2ForCausalLM were not initialized from the model checkpoint at cognitivecomputations_Dolphin3.0-Qwen2.5-0.5B and are newly initialized: ['embed_tokens.weight', 'layers.0.input_layernorm.weight', 'layers.0.mlp.down_proj.weight' ...

However, after looking into it some more, I believe this message is not an issue. I think it is coming from this line due to essentially creating an empty model? I'm not 100% sure.

Interestingly, I still receive an error extracting the lora from this with specific message of "AttributeError: 'LlamaModel' object has no attribute 'lm_head'" which I find a bit bizarre. I got this message before my change too, so it is unrelated, but I'm a bit surprised this change didn't fix it.

cg123 · 2025-01-25T07:19:26Z

Thanks for the PR! I ended up implementing a more general solution that should also handle tied_names properly in #496 - hopefully this hits everything you need. If not let me know and I'll revisit this.

Thanks again!

jrruethe · 2025-01-27T21:10:48Z

Oh! Awesome, thank you so much @cg123!

One quick question, now that there is --save-module lm_head --save-module embed_tokens, if I have extracted any loras using the code from my PR, do I need to re-extract them with this update? Am I missing things? I have been primarily extracting loras as a way to "archive" finetunes to reduce disk space.

I appreciate the help!

Handle weight aliases

686c96e

cg123 closed this Jan 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle weight aliases #490

Handle weight aliases #490

jrruethe commented Jan 21, 2025 •

edited

Loading

cg123 commented Jan 25, 2025

jrruethe commented Jan 27, 2025

Handle weight aliases #490

Handle weight aliases #490

Conversation

jrruethe commented Jan 21, 2025 • edited Loading

cg123 commented Jan 25, 2025

jrruethe commented Jan 27, 2025

jrruethe commented Jan 21, 2025 •

edited

Loading