Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accelerate config file missing for LoRA #765

Open
psyhtest opened this issue Sep 5, 2024 · 4 comments
Open

Accelerate config file missing for LoRA #765

psyhtest opened this issue Sep 5, 2024 · 4 comments

Comments

@psyhtest
Copy link

psyhtest commented Sep 5, 2024

The LoRA reference implementation has a broken link to an Accelerate config file:

where the Accelerate config file is this one.

@psyhtest
Copy link
Author

psyhtest commented Sep 5, 2024

Or is it the same file stored under configs/default_config.yaml?

@ShriyaPalsamudram
Copy link
Contributor

@itayhubara could you please confirm if the configs/default_config.yaml is the Accelerate config file?

If yes, we should also update the link to avoid confusion in the future.

@regisss
Copy link

regisss commented Sep 13, 2024

The link is broken because it points to a private repo and you cannot access it (it's the repo I created at the beginning of the project).
The Accelerate config in my repo is:

compute_environment: LOCAL_MACHINE
debug: false
deepspeed_config:
  gradient_accumulation_steps: 1
  offload_optimizer_device: none
  offload_param_device: none
  zero3_init_flag: true
  zero3_save_16bit_model: true
  zero_stage: 3
distributed_type: DEEPSPEED
downcast_bf16: 'no'
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 8
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false

While https://github.com/mlcommons/training/blob/master/llama2_70b_lora/configs/default_config.yaml is:

compute_environment: LOCAL_MACHINE
debug: false
deepspeed_config:
  gradient_clipping: 0.3
  gradient_accumulation_steps: 1
  offload_optimizer_device: none
  offload_param_device: none
  zero3_init_flag: true
  zero3_save_16bit_model: true
  zero_stage: 3
distributed_type: DEEPSPEED
downcast_bf16: 'no'
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 8
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false

https://github.com/mlcommons/training/blob/master/llama2_70b_lora/configs/default_config.yaml is the right config to use as gradient clipping was added later but I didn't update the config in my repo.

@ShriyaPalsamudram
Copy link
Contributor

@regisss thank you very much for clarifying the difference. Could you please update the link to the accelerate config file in the README by creating a PR to the repo so we can avoid similar issues in the future?

I can help merge that PR for you if you can tag me in it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants