Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REQUEST] DeepSeek-R1-Distill-Qwen-32B-GGUF 1.58-bit version possible ? #1594

Open
Greatz08 opened this issue Jan 30, 2025 · 2 comments
Open

Comments

@Greatz08
Copy link

I did read about your awesome work on reducing the original deepseek r1 weight and making it possible to run it with less number of high quality GPU's with decent speed from this blog -
https://unsloth.ai/blog/deepseekr1-dynamic

I have read some good reviews about 32B distill model so i wanted to test it out personally :-) BUT i have 8GB Vram like many others (laptop user here :)) , so it wont be possible for me to run that distill model even with K_M 4 quant so i request you if it is possible then please create 1.58 bit for this variant and maybe later for 14B one too so that many more can test and see how much performance we are able to get with these new type of thinking based models.

Thankyou very much for your work on quantization unsloth devs :-)

@Greatz08 Greatz08 changed the title DeepSeek-R1-Distill-Qwen-32B-GGUF 1.58-bit version possible ? [REQUEST] DeepSeek-R1-Distill-Qwen-32B-GGUF 1.58-bit version possible ? Jan 30, 2025
@danielhanchen
Copy link
Contributor

Unfortunately my hunch is doing 1.58bit for dense only models (not MoE) will severely break the model, so I'm not sure if it's a good idea :(

@Greatz08
Copy link
Author

Greatz08 commented Feb 1, 2025

@danielhanchen i understand your concern regarding that and i feel the same that it "CAN" break the model, BUT still i will request you to atleast give it a shot for us (poor laptop guys) who cant run that big model anytime soon. That model can perform very well as per benchmarks and that's why we are desperately waiting to test and you know without extreme quantization its not possible.I personally tested 22B IQ2(extremely lowest possible quantized codestral model from mistral which was focused on coding and it did perform great too for me at lowest quantization "at that time", so i believe with thinking abilities we can still get decent performance with much more quantization.So again i can only request on behalf of others to give it a shot atleast or maybe find better solution to quantize it in such a way that we can get decent performance.

:-)
Image

Rest i will leave it you as we are not capable enough to find solution or convert it to 1.58bits quantized. But we can for sure test or provide any information which you think we are capable of giving you back :-) Remember

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants