[Liger-kernel] Add an option to use `_apply_liger_kernel_to_instance()` to load model #133

hongpeng-guo · 2025-01-26T02:13:49Z

Summary

This PR enables to use Liger Kernel's _apply_liger_kernel_to_instance to init a fsdp worker model.

Main Changes

Adding an option of using liger_kernel.transformers.AutoLigerKernelForCausalLM to load a model from pretained, instead of the default transformers.AutoModelForCausalLM
Added a test case using configuration file tests/e2e/run_qwen_gsm8k_model_rm_liger_kernel.sh

Related Issue

#96

TODO

#97 optimize the memory usage when computing entropy & log_probs

verl/verl/workers/actor/dp_actor.py

Lines 94 to 106 in 6d96fda

    
           output = self.actor_module(input_ids=input_ids_rmpad, 
        
                                      attention_mask=None, 
        
                                      position_ids=position_ids_rmpad, 
        
                                      use_cache=False)  # prevent model thinks we are generating 
        
           logits_rmpad = output.logits.squeeze(0)  # (total_nnz, vocab_size) 
        
           logits_rmpad.div_(temperature) 
        
           # compute entropy 
        
           entropy_rmpad = self.compute_entropy_from_logits(logits_rmpad)  # ((total_nnz / sp) + pad) 
        
           # if use_sp: ((total_nnz / sp) + pad) ; if not use_sp: (batch, seqlen) 
        
           log_probs = logprobs_from_logits(logits=logits_rmpad, labels=input_ids_rmpad_rolled)

Signed-off-by: Hongpeng Guo <[email protected]>

corbt · 2025-01-30T02:06:22Z

verl/workers/fsdp_workers.py

+                    _apply_liger_kernel_to_instance(model=actor_module)
+                except ImportError:
+                    # Fallback to use AutoModelForCausalLM and print warning message
+                    logger.warning("Liger kernel was requested but not installed - falling back to AutoModelForCausalLM")


Personally I would prefer for the job to fail outright if liger is requested but not installed. Just printing a warning is too easy to miss in all the job outputs.

I like your proposal. I think we can remove the try ... catch ... logic and add liger_kernel a required dependency of verl. There are still a few extra PRs in Liger kernel targeting for the full integration. Once those are done, I think it makes sense to add liger-kernel to the requirements.txt cc @vermouth1992 @eric-haibin-lin

I guess it's a good idea to make liger-kernel a required dependency and remove try except.

Could you remove try except and run the CI again?

Sure, done.

vermouth1992 · 2025-01-30T05:38:00Z

tests/e2e/run_qwen_gsm8k_model_rm_liger_kernel.sh

@@ -0,0 +1,52 @@
+set -x


Could you add this test to CI and make sure liger-kernel is toggled?

Sure, added this test to .github/workflows/e2e_gsm8k.yml

I guess liger-kernel has to be added to setup.py and project.toml

hongpeng-guo · 2025-01-30T06:49:49Z

tests/e2e/run_qwen_gsm8k_model_rm_liger_kernel.sh

+    actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B \
+    actor_rollout_ref.actor.optim.lr=1e-6 \
+    actor_rollout_ref.model.use_remove_padding=True \
+    +actor_rollout_ref.model.use_liger=True \


The flag for use_liger is here.

Signed-off-by: Hongpeng Guo <[email protected]>

hongpeng-guo

A relevant PR for the fused kernel of ce_loss and entropy as described in TODO can be referenced here: linkedin/Liger-Kernel#551

Signed-off-by: Hongpeng Guo <[email protected]>

vermouth1992 · 2025-01-30T08:09:00Z

Please run the formatter bash script/format.sh

hongpeng-guo · 2025-01-30T08:38:50Z

Please run the formatter bash script/format.sh

Got it! Will handle the comments soon. One suggestion is to have a Contributor's Guid section on the readme, or in the doc, i.e., https://docs.sglang.ai/references/contribution_guide.html The contributors would know the checks to run before pushing a PR. Setting up some pre-commit hooks will be better :)

vermouth1992 · 2025-01-30T08:47:18Z

Please run the formatter bash script/format.sh

Got it! Will handle the comments soon. One suggestion is to have a Contributor's Guid section on the readme, or in the doc, i.e., https://docs.sglang.ai/references/contribution_guide.html The contributors would know the checks to run before pushing a PR. Setting up some pre-commit hooks will be better :)

Added a code formatting instruction in the README. Will add more later

Signed-off-by: Hongpeng Guo <[email protected]>

Add option to use AutoLigerKernelForCausalLM to load model

516a922

Signed-off-by: Hongpeng Guo <[email protected]>

hongpeng-guo marked this pull request as draft January 26, 2025 02:14

hongpeng-guo marked this pull request as ready for review January 26, 2025 02:22

hongpeng-guo added 5 commits January 26, 2025 02:36

remove typo

a039564

Signed-off-by: Hongpeng Guo <[email protected]>

add a test case with liger

562b62a

Signed-off-by: Hongpeng Guo <[email protected]>

use the _apply_fn to apply liger kernel

e5a0852

Signed-off-by: Hongpeng Guo <[email protected]>

format

cb6374a

Signed-off-by: Hongpeng Guo <[email protected]>

add + to the use_liger flag

8a7a0ac

Signed-off-by: Hongpeng Guo <[email protected]>

hongpeng-guo changed the title ~~[Liger-kernel] Add an option to use AutoLigerKernelForCausalLM to load model~~ [Liger-kernel] Add an option to use _apply_liger_kernel_to_instance() to load model Jan 27, 2025

corbt reviewed Jan 30, 2025

View reviewed changes

vermouth1992 reviewed Jan 30, 2025

View reviewed changes

hongpeng-guo commented Jan 30, 2025

View reviewed changes

add liger test to ci

5ee6824

Signed-off-by: Hongpeng Guo <[email protected]>

hongpeng-guo commented Jan 30, 2025

View reviewed changes

add a TODO

2b1f51c

Signed-off-by: Hongpeng Guo <[email protected]>

update requirements and pyproject, remove try catch import liger-kernel

fa02313

Signed-off-by: Hongpeng Guo <[email protected]>

vermouth1992 approved these changes Jan 30, 2025

View reviewed changes

vermouth1992 merged commit dd41877 into volcengine:main Jan 30, 2025
10 checks passed

hongpeng-guo deleted the hpguo/add_liger_kernel branch January 30, 2025 09:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Liger-kernel] Add an option to use `_apply_liger_kernel_to_instance()` to load model #133

[Liger-kernel] Add an option to use `_apply_liger_kernel_to_instance()` to load model #133

hongpeng-guo commented Jan 26, 2025 •

edited

Loading

corbt Jan 30, 2025

hongpeng-guo Jan 30, 2025

vermouth1992 Jan 30, 2025

vermouth1992 Jan 30, 2025

hongpeng-guo Jan 30, 2025

vermouth1992 Jan 30, 2025

hongpeng-guo Jan 30, 2025

vermouth1992 Jan 30, 2025

hongpeng-guo Jan 30, 2025

hongpeng-guo left a comment

vermouth1992 commented Jan 30, 2025

hongpeng-guo commented Jan 30, 2025

vermouth1992 commented Jan 30, 2025

	output = self.actor_module(input_ids=input_ids_rmpad,
	attention_mask=None,
	position_ids=position_ids_rmpad,
	use_cache=False) # prevent model thinks we are generating
	logits_rmpad = output.logits.squeeze(0) # (total_nnz, vocab_size)

	logits_rmpad.div_(temperature)

	# compute entropy
	entropy_rmpad = self.compute_entropy_from_logits(logits_rmpad) # ((total_nnz / sp) + pad)

	# if use_sp: ((total_nnz / sp) + pad) ; if not use_sp: (batch, seqlen)
	log_probs = logprobs_from_logits(logits=logits_rmpad, labels=input_ids_rmpad_rolled)

[Liger-kernel] Add an option to use _apply_liger_kernel_to_instance() to load model #133

[Liger-kernel] Add an option to use _apply_liger_kernel_to_instance() to load model #133

Conversation

hongpeng-guo commented Jan 26, 2025 • edited Loading

Summary

Main Changes

Related Issue

TODO

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hongpeng-guo left a comment

Choose a reason for hiding this comment

vermouth1992 commented Jan 30, 2025

hongpeng-guo commented Jan 30, 2025

vermouth1992 commented Jan 30, 2025

[Liger-kernel] Add an option to use `_apply_liger_kernel_to_instance()` to load model #133

[Liger-kernel] Add an option to use `_apply_liger_kernel_to_instance()` to load model #133

hongpeng-guo commented Jan 26, 2025 •

edited

Loading