-
Notifications
You must be signed in to change notification settings - Fork 259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DeepSeek V3 Support #760
Comments
@tianyu-l Given the performance of this specific model and the recent boom in activity, can we reasonably expect TorchTitan to support this model? I understand this model is not created by Meta, but I (along with others) would value a contribution on efficient training of this model in TorchTitan. |
I agree we probably should prioritize supporting this model. However I feel supporting all training optimizations mentioned in the technical report could be heavy and/or may not be totally aligned with the purpose of torchtitan. Would it still be interesting if we support the model and train it "in our own way", e.g. using parallelisms / optimizations similar to what we do to Llama? |
@tianyu-l I am mainly interested in a model architecture implementation. The remaining details like FP8 training and various forms of parallelism is already implemented in TorchTitan, which should be reused. So it's mainly the following components that I am asking for:
A good starting point would be if we can convert the weights provided on Huggingface to TorchTitan and continue to train it. |
is MoE architecture already supported in this branch? #730 |
Hi @lxww302 - that branch is working for tp2ep. I hit some issues re: dp2ep, so I would not use that yet, but tp2ep was working perfectly in my brief spin with it. |
@tianyu-l Support for DeepSeek-V3 would be excellent given their top-tier performance.
Main parallelism components:
Other main modeling components:
Model weights: https://huggingface.co/deepseek-ai/DeepSeek-V3
Paper link: https://github.com/deepseek-ai/DeepSeek-V3/blob/main/DeepSeek_V3.pdf
Performance:
The text was updated successfully, but these errors were encountered: