You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Actual example of a merge that produced this issue:
models:
- model: Qwen/Qwen2.5-14B-Instruct
parameters:
weight: 0.3
density: 0.4
merge_method: della
base_model: <base model path>
parameters:
epsilon: 0.05
lambda: 1
dtype: bfloat16
tokenizer_source: base
Additional relevant information is that if I get the tokenizer vocab size with tokenizer_vocab_size = len(tokenizer) from ... any Qwen 2.5 14B model, I get the 151665 number rather than the 152064 number that's in the config.json.
I don't fully understand why it's trimming the vocabulary size and embedding layer down in this merge method but none of the others, but it's annoying for compatibility and specifying the tokenizer_source doesn't seem to address the issue (presumably because the tokenizer doesn't actually have 152064 worth of vocabulary)
The text was updated successfully, but these errors were encountered:
Would be a helpful option -- it's causing some downstream effects in other paradigms (like getting into unsloth patching that isn't fully calibrated to the model type, for some reason) and preventing merges with other Qwen 2.5 models.
Actual example of a merge that produced this issue:
Additional relevant information is that if I get the tokenizer vocab size with
tokenizer_vocab_size = len(tokenizer)
from ... any Qwen 2.5 14B model, I get the151665
number rather than the152064
number that's in the config.json.I don't fully understand why it's trimming the vocabulary size and embedding layer down in this merge method but none of the others, but it's annoying for compatibility and specifying the tokenizer_source doesn't seem to address the issue (presumably because the tokenizer doesn't actually have 152064 worth of vocabulary)
The text was updated successfully, but these errors were encountered: