You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Issue Summary: When fine-tuning Meta-Llama-3.1-8B using Unsloth, only one adapter is being updated despite having three adapters and another component that requires grad.
Steps to Reproduce:
Install Unsloth
%%capture
!pip install unsloth
# Also get the latest nightly Unsloth!
!pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git
Load the model and tokenizer using FastLanguageModel.from_pretrained method.
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "unsloth/Meta-Llama-3.1-8B",
max_seq_length = MAX_SEQ_LENGTH,
dtype = DTYPE,
load_in_4bit = LOAD_IN_4BIT,
# token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
);
EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
Load Adapters and add a Router
Configure the model to have multiple adapters and an additional component that requires gradients.
model.print_trainable_parameters()
# Make the Router also trainable
for index in range(len(model.base_model.model.model.layers)):
for param in model.base_model.model.model.layers[index].mlp.router.parameters():
param.requires_grad = True
model.print_trainable_parameters()
Copy weights before fine-tuning.
# Sanity Check: Reserved for checking weight update after training
rounter_0 = copy.deepcopy(model.base_model.model.model.layers[0].mlp.router)
gate_lora_a_0_ada_0 = copy.deepcopy(model.base_model.model.model.layers[0].mlp.gate_proj.lora_A["default"])
gate_lora_a_0_ada_1 = copy.deepcopy(model.base_model.model.model.layers[0].mlp.gate_proj.lora_A["adapter_1"])
gate_lora_a_0__ada_2 = copy.deepcopy(model.base_model.model.model.layers[0].mlp.gate_proj.lora_A["adapter_2"])
Perform the fine-tuning process.
training_arguments = TrainingArguments(per_device_train_batch_size=1,
gradient_accumulation_steps=4,
warmup_ratio=0.1,
# num_train_epochs=3, # Set this for 1 full training run.
max_steps=60,
learning_rate=2e-5,
fp16=not is_bfloat16_supported(),
bf16=is_bfloat16_supported(),
logging_steps=2,
optim="adamw_8bit",
weight_decay=0.01,
lr_scheduler_type="linear",
seed=3407,
output_dir="outputs",
report_to="none", # Use this for WandB etc
)
trainer = SFTTrainer(model=model,
tokenizer=tokenizer,
train_dataset=hybrid_dataset,
dataset_text_field="text",
max_seq_length=MAX_SEQ_LENGTH,
dataset_num_proc=2,
packing=False, # Can make training 5x faster for short sequences.
# gradient_checkpointing=True,
args=training_arguments,)
trainer_stats = trainer.train()
Compare the copied weights with the trained weights.
Issue Summary: When fine-tuning Meta-Llama-3.1-8B using Unsloth, only one adapter is being updated despite having three adapters and another component that requires grad.
Steps to Reproduce:
Load Adapters and add a Router
Configure the model to have multiple adapters and an additional component that requires gradients.
Expected Behavior: All three adapters and the additional component should receive gradient updates during fine-tuning.
Observed Behavior: Only one adapter is being updated, while the other adapters and the additional component are not receiving gradient updates.
Environment:
Unsloth version: 2025.1.7
PyTorch version: 2.5.1+cu121
Python version: 3.10.12
The text was updated successfully, but these errors were encountered: