Accelerate config file missing for LoRA #765

psyhtest · 2024-09-05T13:35:01Z

The LoRA reference implementation has a broken link to an Accelerate config file:

where the Accelerate config file is this one.

psyhtest · 2024-09-05T13:39:01Z

Or is it the same file stored under configs/default_config.yaml?

ShriyaPalsamudram · 2024-09-10T20:36:54Z

@itayhubara could you please confirm if the configs/default_config.yaml is the Accelerate config file?

If yes, we should also update the link to avoid confusion in the future.

regisss · 2024-09-13T08:36:08Z

The link is broken because it points to a private repo and you cannot access it (it's the repo I created at the beginning of the project).
The Accelerate config in my repo is:

compute_environment: LOCAL_MACHINE
debug: false
deepspeed_config:
  gradient_accumulation_steps: 1
  offload_optimizer_device: none
  offload_param_device: none
  zero3_init_flag: true
  zero3_save_16bit_model: true
  zero_stage: 3
distributed_type: DEEPSPEED
downcast_bf16: 'no'
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 8
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false

While https://github.com/mlcommons/training/blob/master/llama2_70b_lora/configs/default_config.yaml is:

compute_environment: LOCAL_MACHINE
debug: false
deepspeed_config:
  gradient_clipping: 0.3
  gradient_accumulation_steps: 1
  offload_optimizer_device: none
  offload_param_device: none
  zero3_init_flag: true
  zero3_save_16bit_model: true
  zero_stage: 3
distributed_type: DEEPSPEED
downcast_bf16: 'no'
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 8
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false

https://github.com/mlcommons/training/blob/master/llama2_70b_lora/configs/default_config.yaml is the right config to use as gradient clipping was added later but I didn't update the config in my repo.

ShriyaPalsamudram · 2024-09-16T14:51:00Z

@regisss thank you very much for clarifying the difference. Could you please update the link to the accelerate config file in the README by creating a PR to the repo so we can avoid similar issues in the future?

I can help merge that PR for you if you can tag me in it.

ShriyaPalsamudram mentioned this issue Sep 19, 2024

fix llama2_70b_lora broken link for Accelerate config file in the readme #766

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accelerate config file missing for LoRA #765

Accelerate config file missing for LoRA #765

psyhtest commented Sep 5, 2024

psyhtest commented Sep 5, 2024

ShriyaPalsamudram commented Sep 10, 2024

regisss commented Sep 13, 2024

ShriyaPalsamudram commented Sep 16, 2024

Accelerate config file missing for LoRA #765

Accelerate config file missing for LoRA #765

Comments

psyhtest commented Sep 5, 2024

psyhtest commented Sep 5, 2024

ShriyaPalsamudram commented Sep 10, 2024

regisss commented Sep 13, 2024

ShriyaPalsamudram commented Sep 16, 2024