-
Notifications
You must be signed in to change notification settings - Fork 869
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for saving sharded checkpoints? #3209
Comments
I think I was able to work around that issue by renaming
|
Ok, I was able to make it work with the below patch. Sharing in case anyone else runs into the same problem. Hope this is properly integrated in the future
|
Yeah. We do have plan to support different shared memory in SGLang. Stay tuned at: |
Does sglang support sharded checkpoints? I see in here https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/model_loader/loader.py#L492 that there is a loader and it recommends using
examples/save_sharded_state.py
to save the sharded state, but this file doesn't exist.Does it refer to this one from vllm https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/save_sharded_state.py?
Also the load-format doesn't have a choice for sharded_state https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/server_args.py#L315, is that a typo or it's not supposed to be used?
My real problem is that I'm trying to load DeepSeek-R1 and it takes a very long time. I have a sharded checkpoint that vllm can load instantly, but sglang raises the following error (after I add "sharded_state" to choices in launcher to avoid error right away)
Does it mean that sglang shards model differently and I need to redo the sharding in some way? Or it's not supported at all? Or is there any other recommended way to load R1/V3 model fast?
The text was updated successfully, but these errors were encountered: