Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qwen2.5 14B models are ... sometimes? ... having their token vocabulary truncated down to 'actual'? #425

Open
ann-brown opened this issue Sep 27, 2024 · 2 comments

Comments

@ann-brown
Copy link

ann-brown commented Sep 27, 2024

Actual example of a merge that produced this issue:

models:
  - model: Qwen/Qwen2.5-14B-Instruct
    parameters:
      weight: 0.3
      density: 0.4
merge_method: della
base_model: <base model path>
parameters:
  epsilon: 0.05
  lambda: 1
dtype: bfloat16
tokenizer_source: base

Additional relevant information is that if I get the tokenizer vocab size with tokenizer_vocab_size = len(tokenizer) from ... any Qwen 2.5 14B model, I get the 151665 number rather than the 152064 number that's in the config.json.

I don't fully understand why it's trimming the vocabulary size and embedding layer down in this merge method but none of the others, but it's annoying for compatibility and specifying the tokenizer_source doesn't seem to address the issue (presumably because the tokenizer doesn't actually have 152064 worth of vocabulary)

@cg123
Copy link
Collaborator

cg123 commented Oct 26, 2024

When using tokenizer_source/tokenizer new tensors are created for embeddings and LM heads that exactly match the output vocabulary size.

I can look at adding an option for padding the size up to the nearest multiple of 32 if that's causing an issue.

@ann-brown
Copy link
Author

Would be a helpful option -- it's causing some downstream effects in other paradigms (like getting into unsloth patching that isn't fully calibrated to the model type, for some reason) and preventing merges with other Qwen 2.5 models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants