-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Liger-kernel] Add an option to use _apply_liger_kernel_to_instance()
to load model
#133
[Liger-kernel] Add an option to use _apply_liger_kernel_to_instance()
to load model
#133
Conversation
Signed-off-by: Hongpeng Guo <[email protected]>
Signed-off-by: Hongpeng Guo <[email protected]>
Signed-off-by: Hongpeng Guo <[email protected]>
Signed-off-by: Hongpeng Guo <[email protected]>
Signed-off-by: Hongpeng Guo <[email protected]>
Signed-off-by: Hongpeng Guo <[email protected]>
AutoLigerKernelForCausalLM
to load model_apply_liger_kernel_to_instance()
to load model
verl/workers/fsdp_workers.py
Outdated
_apply_liger_kernel_to_instance(model=actor_module) | ||
except ImportError: | ||
# Fallback to use AutoModelForCausalLM and print warning message | ||
logger.warning("Liger kernel was requested but not installed - falling back to AutoModelForCausalLM") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I would prefer for the job to fail outright if liger is requested but not installed. Just printing a warning is too easy to miss in all the job outputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like your proposal. I think we can remove the try ... catch ...
logic and add liger_kernel
a required dependency of verl. There are still a few extra PRs in Liger kernel targeting for the full integration. Once those are done, I think it makes sense to add liger-kernel
to the requirements.txt
cc @vermouth1992 @eric-haibin-lin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's a good idea to make liger-kernel a required dependency and remove try except.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you remove try except and run the CI again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, done.
@@ -0,0 +1,52 @@ | |||
set -x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add this test to CI and make sure liger-kernel is toggled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, added this test to .github/workflows/e2e_gsm8k.yml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess liger-kernel has to be added to setup.py and project.toml
actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B \ | ||
actor_rollout_ref.actor.optim.lr=1e-6 \ | ||
actor_rollout_ref.model.use_remove_padding=True \ | ||
+actor_rollout_ref.model.use_liger=True \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The flag for use_liger
is here.
Signed-off-by: Hongpeng Guo <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A relevant PR for the fused kernel of ce_loss
and entropy
as described in TODO can be referenced here: linkedin/Liger-Kernel#551
Signed-off-by: Hongpeng Guo <[email protected]>
Please run the formatter |
Got it! Will handle the comments soon. One suggestion is to have a |
Added a code formatting instruction in the README. Will add more later |
Signed-off-by: Hongpeng Guo <[email protected]>
Summary
This PR enables to use Liger Kernel's
_apply_liger_kernel_to_instance
to init a fsdp worker model.Main Changes
liger_kernel.transformers.AutoLigerKernelForCausalLM
to load a model from pretained, instead of the defaulttransformers.AutoModelForCausalLM
tests/e2e/run_qwen_gsm8k_model_rm_liger_kernel.sh
Related Issue
#96
TODO
#97 optimize the memory usage when computing entropy & log_probs
verl/verl/workers/actor/dp_actor.py
Lines 94 to 106 in 6d96fda