Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeepSeek-R1 support #174

Open
loretoparisi opened this issue Feb 7, 2025 · 7 comments
Open

DeepSeek-R1 support #174

loretoparisi opened this issue Feb 7, 2025 · 7 comments
Assignees
Labels
new models require new models

Comments

@loretoparisi
Copy link

Add support or a recipe to quantize DeepSeek-R1 and related distilled versions DeepSeek-R1-Distill-Llama-70B, DeepSeek-R1-Distill-Qwen-32B, eventually including 7B, 14B Qwen, and Llama 8B

@YangWang92 YangWang92 self-assigned this Feb 8, 2025
@YangWang92 YangWang92 added the new models require new models label Feb 8, 2025
@YangWang92
Copy link
Contributor

We are almost there, and I have prepared the hessian collection codes for Deepseek R1/V3. Please wait for us for one week.

@wejoncy
Copy link
Contributor

wejoncy commented Feb 8, 2025

Here is a way to quantize those related distilled versions

  1. install vptq-algorithm by following https://github.com/microsoft/VPTQ/blob/algorithm/algorithm.md#environment-setting
  2. install qllm by pip install qllm
  3. create a example quant-config.json python -m qllm --model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --quant_method=vptq --quant_config=help
  4. copy and save the config, edit those values with your thought
  5. do quatization python -m qllm --model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --quant_method=vptq --quant_config=xx.json --save DeepSeek-R1-Distill-Qwen-32B-vptq

@YangWang92
Copy link
Contributor

Some updates: we have already collected all hessian matrices and are trying to adapt the quantization algorithm.

@YangWang92
Copy link
Contributor

Some updates: I have adapted the algorithm to deepseek models and stay tuned.

@YangWang92
Copy link
Contributor

Image Now, we have an early ~2 bits version of Deepseek R1, and it works well on 4xA100 80G.

@loretoparisi
Copy link
Author

Image Now, we have an early ~2 bits version of Deepseek R1, and it works well on 4xA100 80G.

💯 This is truly amazing.
Running a test on our 4 x H100 today!

@YangWang92
Copy link
Contributor

Image Now, we have an early ~2 bits version of Deepseek R1, and it works well on 4xA100 80G.

💯 This is truly amazing. Running a test on our 4 x H100 today!

The current inference speed is still a bit slow (based on deepseek repo's bare torch design). Please wait a moment while I prepare it—you can give it a try soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new models require new models
Projects
None yet
Development

No branches or pull requests

3 participants