Skip to content

Commit

Permalink
Fixed typo with --only_optimize_lora (#331)
Browse files Browse the repository at this point in the history
Co-authored-by: Hyeongmin Moon <[email protected]>
Co-authored-by: Zhewei Yao <[email protected]>
  • Loading branch information
3 people authored Apr 24, 2023
1 parent dfb9491 commit ab4e2e5
Show file tree
Hide file tree
Showing 5 changed files with 5 additions and 5 deletions.
2 changes: 1 addition & 1 deletion applications/DeepSpeed-Chat/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -181,7 +181,7 @@ def launch_cmd(args, step_num, cmd):
f"If you are seeing an OOM error, try modifying {get_script(args, step_num)}:",
" - Reduce `--per_device_*_batch_size`",
" - Increase `--zero_stage {0,1,2,3}` on multi-gpu setups",
" - Enable `--gradient_checkpointing` or `--only_optimizer_lora`"
" - Enable `--gradient_checkpointing` or `--only_optimize_lora`"
)))


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ Most of the arguments used in the main.py file have clear explanations and are u
| --lora_dim | When it is larger than 0, LoRA will be enabled | Usually, LoRA needs a larger learning rate for better convergence |
| --lora_module_name | The scope to enable LoRA module. | |
| --only_optimize_lora | Freeze all othre paramters and only optimize LoRA-related prameters | |
| --gradient_checkpoint, --lora_dim, only_optimizer_lora | When LoRA and Gradient Checkpointing are enabled. Only Optimize LoRA cannot be enabled | If all three are enabled, it will affect the gradient flow (aka the augo-grad system backend by PyTorch) |
| --gradient_checkpoint, --lora_dim, only_optimize_lora | When LoRA and Gradient Checkpointing are enabled. Only Optimize LoRA cannot be enabled | If all three are enabled, it will affect the gradient flow (aka the augo-grad system backend by PyTorch) |

One important consideration for users is determining the maximum model size they can train using their current system. Here, we present a method for estimating this limit. Assuming that you do not use the offload feature and enable (i) zero stage 3 (if using multiple GPUs), (ii) gradient checkpoint, and (iii) LoRA, the approximate maximum model size (in billions of parameters) that you can train can be estimated as "Total GPU memory in GB divided by 3." For example, if you have a single A6000-48G GPU, you can probably train models up to 16 billion parameters. It is important to note that this is a rough estimation, and you should verify it by yourselves.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,7 @@ def parse_args():
if args.gradient_checkpointing and args.lora_dim > 0:
assert (
not args.only_optimize_lora
), "--gradient_checkpointing and --only_optimizer_lora cannot be enabled at the same time."
), "--gradient_checkpointing and --only_optimize_lora cannot be enabled at the same time."

return args

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,7 @@ def parse_args():
if args.gradient_checkpointing and args.lora_dim > 0:
assert (
not args.only_optimize_lora
), "--gradient_checkpointing and --only_optimizer_lora cannot be enabled at the same time."
), "--gradient_checkpointing and --only_optimize_lora cannot be enabled at the same time."

return args

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -296,7 +296,7 @@ def parse_args():
and args.critic_lora_dim > 0):
assert (
not args.only_optimize_lora
), "--{actor,critic}_gradient_checkpointing and --only_optimizer_lora cannot be enabled at the same time."
), "--{actor,critic}_gradient_checkpointing and --only_optimize_lora cannot be enabled at the same time."

if args.inference_tp_size > 1:
assert (
Expand Down

0 comments on commit ab4e2e5

Please sign in to comment.