Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: error loading model DeepSeek-R1-Q4_K_M.gguf : error loading model vocabulary: unknown pre-tokenizer type: 'deepseek-v3' #699

Open
Tridu33 opened this issue Feb 25, 2025 · 4 comments

Comments

@Tridu33
Copy link

Tridu33 commented Feb 25, 2025

Contact Details

[email protected]

What happened?

i download GGUF files DeepSeek-R1-Q4_K_M from https://www.modelscope.cn/models/unsloth/DeepSeek-R1-GGUF/files , merge them into one single file:

/home/tridu33/workspace/llama.cpp4Ascend/build/bin/llama-gguf-split --merge /home/tridu33/workspace/soft/llamafile/models/DeepSeek-R1-Q4_K_M-split/DeepSeek-R1-Q4_K_M-00001-of-00009.gguf  /home/tridu33/workspace/soft/llamafile/models/DeepSeek-R1-Q4_K_M_one/DeepSeek-R1-Q4_K_M.gguf

Version

branch main llamafile latest

Relevant log output

compile llamafile from source, then i run 👍

/home/tridu33/workspace/soft/llamafile/o/llama.cpp/main/main  -ngl 9999 -m /home/tridu33/workspace/soft/llamafile/models/DeepSeek-R1-Q4_K_M_one/DeepSeek-R1-Q4_K_M.gguf -p '[INST]Write a story about llamas[/INST]'
import_cuda_impl: initializing gpu module...
get_rocm_bin_path: note: hipcc not found on $PATH
get_rocm_bin_path: note: $HIP_PATH/bin/hipcc does not exist
get_rocm_bin_path: note: /opt/rocm/bin/hipcc does not exist
extract_cuda_dso: note: prebuilt binary /zip/ggml-rocm.so not found
import_cuda_impl: won't compile AMD GPU support because $HIP_PATH/bin/clang++ is missing
extract_cuda_dso: note: prebuilt binary /zip/ggml-rocm.so not found
get_nvcc_path: note: nvcc not found on $PATH
get_nvcc_path: note: $CUDA_PATH/bin/nvcc does not exist
get_nvcc_path: note: /opt/cuda/bin/nvcc does not exist
get_nvcc_path: note: /usr/local/cuda/bin/nvcc does not exist
extract_cuda_dso: note: prebuilt binary /zip/ggml-cuda.so not found
extract_cuda_dso: note: prebuilt binary /zip/ggml-cuda.so not found
warning: --n-gpu-layers 9999 was passed but no GPUs were found; falling back to CPU inference
Log start
main: build = 1500 (a30b324)
main: built with cosmocc (GCC) 11.2.0 for x86_64-linux-cosmo
main: seed  = 1740455235
llama_model_loader: loaded meta data with 48 key-value pairs and 1025 tensors from /home/tridu33/workspace/soft/llamafile/models/DeepSeek-R1-Q4_K_M_one/DeepSeek-R1-Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = deepseek2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = DeepSeek R1 BF16
llama_model_loader: - kv   3:                       general.quantized_by str              = Unsloth
llama_model_loader: - kv   4:                         general.size_label str              = 256x20B
llama_model_loader: - kv   5:                           general.repo_url str              = https://huggingface.co/unsloth
llama_model_loader: - kv   6:                      deepseek2.block_count u32              = 61
llama_model_loader: - kv   7:                   deepseek2.context_length u32              = 163840
llama_model_loader: - kv   8:                 deepseek2.embedding_length u32              = 7168
llama_model_loader: - kv   9:              deepseek2.feed_forward_length u32              = 18432
llama_model_loader: - kv  10:             deepseek2.attention.head_count u32              = 128
llama_model_loader: - kv  11:          deepseek2.attention.head_count_kv u32              = 128
llama_model_loader: - kv  12:                   deepseek2.rope.freq_base f32              = 10000.000000
llama_model_loader: - kv  13: deepseek2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  14:                deepseek2.expert_used_count u32              = 8
llama_model_loader: - kv  15:        deepseek2.leading_dense_block_count u32              = 3
llama_model_loader: - kv  16:                       deepseek2.vocab_size u32              = 129280
llama_model_loader: - kv  17:            deepseek2.attention.q_lora_rank u32              = 1536
llama_model_loader: - kv  18:           deepseek2.attention.kv_lora_rank u32              = 512
llama_model_loader: - kv  19:             deepseek2.attention.key_length u32              = 192
llama_model_loader: - kv  20:           deepseek2.attention.value_length u32              = 128
llama_model_loader: - kv  21:       deepseek2.expert_feed_forward_length u32              = 2048
llama_model_loader: - kv  22:                     deepseek2.expert_count u32              = 256
llama_model_loader: - kv  23:              deepseek2.expert_shared_count u32              = 1
llama_model_loader: - kv  24:             deepseek2.expert_weights_scale f32              = 2.500000
llama_model_loader: - kv  25:              deepseek2.expert_weights_norm bool             = true
llama_model_loader: - kv  26:               deepseek2.expert_gating_func u32              = 2
llama_model_loader: - kv  27:             deepseek2.rope.dimension_count u32              = 64
llama_model_loader: - kv  28:                deepseek2.rope.scaling.type str              = yarn
llama_model_loader: - kv  29:              deepseek2.rope.scaling.factor f32              = 40.000000
llama_model_loader: - kv  30: deepseek2.rope.scaling.original_context_length u32              = 4096
llama_model_loader: - kv  31: deepseek2.rope.scaling.yarn_log_multiplier f32              = 0.100000
llama_model_loader: - kv  32:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  33:                         tokenizer.ggml.pre str              = deepseek-v3
llama_model_loader: - kv  34:                      tokenizer.ggml.tokens arr[str,129280]  = ["<|begin▁of▁sentence|>", "<�...
llama_model_loader: - kv  35:                  tokenizer.ggml.token_type arr[i32,129280]  = [3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  36:                      tokenizer.ggml.merges arr[str,127741]  = ["Ġ t", "Ġ a", "i n", "Ġ Ġ", "h e...
llama_model_loader: - kv  37:                tokenizer.ggml.bos_token_id u32              = 0
llama_model_loader: - kv  38:                tokenizer.ggml.eos_token_id u32              = 1
llama_model_loader: - kv  39:            tokenizer.ggml.padding_token_id u32              = 128815
llama_model_loader: - kv  40:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  41:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  42:                    tokenizer.chat_template str              = {% if not add_generation_prompt is de...
llama_model_loader: - kv  43:               general.quantization_version u32              = 2
llama_model_loader: - kv  44:                          general.file_type u32              = 15
llama_model_loader: - kv  45:                                   split.no u16              = 0
llama_model_loader: - kv  46:                        split.tensors.count i32              = 1025
llama_model_loader: - kv  47:                                split.count u16              = 0
llama_model_loader: - type  f32:  361 tensors
llama_model_loader: - type q4_K:  606 tensors
llama_model_loader: - type q6_K:   58 tensors
llama_model_load: error loading model: error loading model vocabulary: unknown pre-tokenizer type: 'deepseek-v3'
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '/home/tridu33/workspace/soft/llamafile/models/DeepSeek-R1-Q4_K_M_one/DeepSeek-R1-Q4_K_M.gguf'
main: error: unable to load model
@Tridu33 Tridu33 changed the title Bug: error loading model DeepSeek-R1-Q4_K_M.gguf Bug: error loading model DeepSeek-R1-Q4_K_M.gguf : error loading model vocabulary: unknown pre-tokenizer type: 'deepseek-v3' Feb 25, 2025
@Tridu33
Copy link
Author

Tridu33 commented Feb 25, 2025

i found maybe it called deepseek3 in

} else if (tmpl == "deepseek3" || tmpl_contains(LU8("<|Assistant|>")) && tmpl_contains(LU8("<|User|>")) && tmpl_contains(LU8("<|end▁of▁sentence|>"))) {
, but how can i change pre-tokenizer type from 'deepseek-v3' to the right one?

@Tridu33
Copy link
Author

Tridu33 commented Feb 25, 2025

if llamafile support DeepSeek-R1-Q4_K_M.gguf or not?

@Tridu33
Copy link
Author

Tridu33 commented Feb 25, 2025

ggml-org/llama.cpp#12021 similar problem here, what i use is the latest main branch of llamafile.

@Tridu33
Copy link
Author

Tridu33 commented Feb 25, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant