Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deepspeed problem with train_lora.py #3648

Open
XqZeppelinhead0702 opened this issue Dec 30, 2024 · 0 comments
Open

deepspeed problem with train_lora.py #3648

XqZeppelinhead0702 opened this issue Dec 30, 2024 · 0 comments

Comments

@XqZeppelinhead0702
Copy link

Hello, recently I've tried to fine-tune vicuna with train_lora.py and encounterd some error I failed to resolve.
I try the following script

deepspeed $CODE_PATH \
    --model_name_or_path $MODEL_PATH \
    --lora_r 8 \
    --lora_alpha 16 \
    --lora_dropout 0.05 \
    --data_path $DATA_PATH \
    --bf16 True \
    --output_dir $OUTPUT_DIR \
    --num_train_epochs 3 \
    --per_device_train_batch_size 32 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 1 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 1200 \
    --save_total_limit 100 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 True \
    --model_max_length 2048 \
    --q_lora True \
    --deepspeed $DS_PATH \

and the error is as follows:
QQ20241230-151344
I check the previous issues, remove the deepspeed import and modify the error line as issues/1458#issuecomment-1598308288 but still encounter the similar error at line 124 in train_lora.py:
QQ20241230-152031
Obviously it can not be modified in the same way because the trainer has not been announced before line 124 in the function train(), so I wonder if there's any other approach to solve that? Or maybe the version of transformers and deepspeed in my environment is not matched with the current repo?

My Enviroment: transformers==4.47.1 and deepspeed==0.16.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant