You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Due to resource limitations, the program will be interrupted during training, and I want to continue training.
So I set save_strategy="epoch" in TrainingArguments() to save the checkpoint.
After the interruption, I will get the previous epoch, such as checkpoint-1166, and the files inside are: optimizer.pt/rng_state.pth/scheduler.pt/trainer_state.json/intervenable_model, where intervenable_model has a suffix of .bin.
But when I change trainer.train() to trainer.train(resume_from_checkpoint=True), I will get the error:
AttributeError: 'ReftModel' object has no attribute '_keys_to_ignore_on_save'
How can I achieve this?
The text was updated successfully, but these errors were encountered:
Due to resource limitations, the program will be interrupted during training, and I want to continue training.
So I set
save_strategy="epoch"
inTrainingArguments()
to save the checkpoint.After the interruption, I will get the previous epoch, such as
checkpoint-1166
, and the files inside are:optimizer.pt/rng_state.pth/scheduler.pt/trainer_state.json/intervenable_model
, whereintervenable_model
has a suffix of.bin
.But when I change
trainer.train()
totrainer.train(resume_from_checkpoint=True)
, I will get the error:How can I achieve this?
The text was updated successfully, but these errors were encountered: