Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: New DeepSeek-R1-Distill-Qwen models do not load #684

Open
Eudox67 opened this issue Jan 21, 2025 · 15 comments
Open

Bug: New DeepSeek-R1-Distill-Qwen models do not load #684

Eudox67 opened this issue Jan 21, 2025 · 15 comments

Comments

@Eudox67
Copy link

Eudox67 commented Jan 21, 2025

Contact Details

[email protected]

What happened?

When attempting to load a DeepSeek-R1-DeepSeek-Distill-Qwen-GGUF model, llamafile fails to load the model -- any of 1.5b, 7b, 14b, or 32b. This occurs using llamafiler, llamafile, or as a .llamafile conversion under traditional or --v2.

Version

llamafile v0.9.0

What operating system are you seeing the problem on?

Linux

Relevant log output

llama_model_load: error loading model: error loading model vocabulary: unknown pre-tokenizer type: 'deepseek-r1-qwen'
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'DeepSeek-R1-Distill-Qwen-14B-Q8_0.gguf'
{"function":"load_model","level":"ERR","line":463,"model":"DeepSeek-R1-Distill-Qwen-14B-Q8_0.gguf","msg":"unable to load model","tid":"12265824","timestamp":1737498365}

--v2
DeepSeek-R1-Distill-Qwen-14B-Q8_0.gguf: failed to load model
@quantumalchemy
Copy link

Yep not loading any DeepSeek Distill -- vocabulary: unknown pre-tokenizer type: 'deepseek-r1

@sholtomaud
Copy link

Serg Gini — Yesterday at 3:19 AM
Most of the model require explicit support actually.
ggml-org/llama.cpp#11310
ggml-org/llama.cpp#11324

most probably llamafile just need update on master from llama.cpp

@BradHutchings
Copy link

I was able to build my own GGUF and get llamafile to load it. It will be in my Brad's LLMs repo "soon". Watch for it in the models folder.

https://huggingface.co/bradhutchings/Brads-LLMs

@quantumalchemy
Copy link

yeah thanks but I want to use my own gguf - will wait for the update - hope soon
so how did you fix your llm so it works?

@Eudox67
Copy link
Author

Eudox67 commented Jan 24, 2025

Also:

Architecture Model Size
openelm 1.08

I need 14b or 32b and want rocm gpu offload optimization for a Linux server. May have to get complicated and start using SGLang.

@evangineer
Copy link

This has been fixed upstream:
ggml-org/llama.cpp@ec7f3ac

@Eudox67
Copy link
Author

Eudox67 commented Jan 27, 2025

This has been fixed upstream: ggerganov/llama.cpp@ec7f3ac

Have you been able to compile it with llamafile?

@quantumalchemy
Copy link

someone needs to fork this thing.. just read (discord) the lead dev just took a job from google.. of course ; >.. don't blame her
go get paid! you deserve it! - thanks for creating this thing but needs a fork to update - its beyond my ken

@Xydane
Copy link
Contributor

Xydane commented Jan 29, 2025

I've fixed this in pull request #687

@cjpais
Copy link
Collaborator

cjpais commented Jan 30, 2025

#687 just got merged! please give it a try

@xor2003
Copy link

xor2003 commented Feb 6, 2025

works with DeepSeek-R1-Distill-Qwen-1.5B-Q8_0.gguf for me

@stonez56
Copy link

stonez56 commented Feb 9, 2025

I don't know how to compile from source file.
Is there a method to create a JSON or YAML file to allow adding custom model files in the future?

@DonaldCMLIN
Copy link

Might I know, is there any solution
to fix
llamafile.exe , (v0.9), loading model: error loading model vocabulary: unknown pre-tokenizer type: 'deepseek-r1-qwen'

Although many professionals have mentioned upgrading to a new version of llama.cpp,
it remains unclear how to integrate and build it with a new llamafile.

@cjpais
Copy link
Collaborator

cjpais commented Feb 14, 2025

It's not in the main release yet, you will have to compile from source for now

@DarkTyger
Copy link

git clone https://github.com/Mozilla-Ocho/llamafile 
cd llamafile
make -j$(nproc)
make -j$(nproc) install PREFIX=$HOME/bin/llamafile

Error:

install: cannot stat 'o//stable-diffusion.cpp/main': No such file or directory
make: *** [Makefile:62: install] Error 1

Also:

git clone https://github.com/Mozilla-Ocho/llamafile 
cd llamafile
rm -rf llama.cpp
git clone https://github.com/ggerganov/llama.cpp
make -j$(nproc)

Error:

In file included from whisper.cpp/whisper.h:4:
whisper.cpp/ggml_extend.hpp:4:10: fatal error: 'llama.cpp/ggml-alloc.h' file not found
    4 | #include "llama.cpp/ggml-alloc.h"
      |          ^~~~~~~~~~~~~~~~~~~~~~~~
.cosmocc/4.0.2/bin/cosmoc++ -O2 -g -fexceptions -ffunction-sections -fdata-sections -mclang -DGGML_MULTIPLATFORM -frtti -std=gnu++23 -frtti -Wno-deprecated-declarations -iquote. -mcosmo -DGGML_MULTIPLATFORM -Wno-attributes -DLLAMAFILE_DEBUG  -Xx86_64-mtune=znver4 -c -o o//whisper.cpp/server.o whisper.cpp/server.cpp
In file included from whisper.cpp/mic2raw.cpp:19:
In file included from whisper.cpp/whisper.h:4:
whisper.cpp/ggml_extend.hpp:4:10: fatal error: 'llama.cpp/ggml-alloc.h' file not found
    4 | #include "llama.cpp/ggml-allollamafile/tokenize.cpp:27:10c.: h"fatal error: 'llama.cpp/llama.h' file not found

      |    27 | #include "         ^~~~~~~~~~~~~~~~~~~~~~~~l
lama.cpp/llama.h"
      |          ^~~~~~~~~~~~~~~~~~~
whisper.cpp/main.cpp:6:10: fatal error: 'llama.cpp/cores.h' file not found
    6 | #include "llama.cpp/cores.h"
      |          ^~~~~~~~~~~~~~~~~~~
In file included from whisper.cpp/mic2txt.cpp:19:
In file included from whisper.cpp/whisper.h:4:
whisper.cpp/ggml_extend.hpp:4:10: fatal error: 'llama.cpp/ggml-alloc.h' file not found
    4 | #include "llama.cpp/ggml-alloc.h"
      |          ^~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
make: *** [build/rules.mk:26: o//llamafile/tokenize.o] Error 1
make: *** Waiting for unfinished jobs....
1 error generated.
make: *** [build/rules.mk:26: o//whisper.cpp/mic2txt.o] Error 1
1 error generated.
make: *** [build/rules.mk:25: o//whisper.cpp/mic2raw.o] Error 1
1 error generated.
make: *** [build/rules.mk:25: o//whisper.cpp/grammar-parser.o] Error 1
1 error generated.
make: *** [build/rules.mk:25: o//whisper.cpp/main.o] Error 1
In file included from whisper.cpp/server.cpp:7:
In file included from whisper.cpp/whisper.h:4:
whisper.cpp/ggml_extend.hpp:4:10: fatal error: 'llama.cpp/ggml-alloc.h' file not found
    4 | #include "llama.cpp/ggml-alloc.h"
      |          ^~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
make: *** [build/rules.mk:25: o//whisper.cpp/server.o] Error 1

The include files appear to have been restructured for llama.cpp, making the build fail. I didn't see an easy way to update the includes path, so I revised the includes manually:

includes.patch.txt

With these changes in place, the build still fails.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests