Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After using peft, the performance indicators decreased. #2336

Open
2 of 4 tasks
KQDtianxiaK opened this issue Jan 20, 2025 · 1 comment
Open
2 of 4 tasks

After using peft, the performance indicators decreased. #2336

KQDtianxiaK opened this issue Jan 20, 2025 · 1 comment

Comments

@KQDtianxiaK
Copy link

System Info

Sorry, I just finished the previous question and I still have to ask you a new question. I use the DNABert2 model, whose original structure is as follows:

BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(4096, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0): BertLayer(
          (attention): BertUnpadAttention(
            (self): BertUnpadSelfAttention(
              (dropout): Dropout(p=0.0, inplace=False)
              (Wqkv): Linear(in_features=768, out_features=2304, bias=True)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (mlp): BertGatedLinearUnitMLP(
            (gated_layers): Linear(in_features=768, out_features=6144, bias=False)
            (act): GELU(approximate='none')
            (wo): Linear(in_features=3072, out_features=768, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
            (layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          )
        )
        ......
        (11): BertLayer(
          (attention): BertUnpadAttention(
            (self): BertUnpadSelfAttention(
              (dropout): Dropout(p=0.0, inplace=False)
              (Wqkv): Linear(in_features=768, out_features=2304, bias=True)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (mlp): BertGatedLinearUnitMLP(
            (gated_layers): Linear(in_features=768, out_features=6144, bias=False)
            (act): GELU(approximate='none')
            (wo): Linear(in_features=3072, out_features=768, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
            (layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          )
        )
      )
    )
    (pooler): BertPooler(
      (dense): Linear(in_features=768, out_features=768, bias=True)
      (activation): Tanh()
    )
  )
  (dropout): Dropout(p=0.1, inplace=False)
  (classifier): Linear(in_features=768, out_features=157, bias=True)
)

On three different classification tasks, I used OFT, LNTuning and other methods to add fine-tuning modules to linear layers such as ['Wqkv'/'wo'/'gated_layers'], or ['LayerNorm'] and other parts for supervised training. During the training process, the logs output at each logging_steps are normal, the loss keeps decreasing, and the performance indicators keep rising. However, when the saved model weights are finally called to evaluate on the independent test set, the performance will be very poor, and the results are basically equivalent to It's the same as having no training at all. The following error is reported when calling:

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at model/DNABERT2-117M and are newly initialized: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'classifier.bias', 'classifier.weight']

Who can help?

@BenjaminBossan

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder
  • My own task or dataset (give details below)

Reproduction

and the code I use to call the model from each path that holds the best model weights:

def load_best_model_for_test(checkpoint_dir, fold_number):
    fold_dir = os.path.join(checkpoint_dir)
    checkpoint_folders = [d for d in os.scandir(fold_dir) if d.is_dir() and d.name.startswith('checkpoint')]
    best_model_dir = max(checkpoint_folders, key=lambda d: os.path.getmtime(d.path), default=None)
    best_model_path = best_model_dir.path

    model = AutoPeftModelForSequenceClassification.from_pretrained(best_model_path, trust_remote_code=True, num_labels=2)
    return model

def evaluate_on_test_set(models, test_dataset):
    test_results = []
    for model in models:
        trainer = Trainer(
            model=model,
            args=training_args,
            eval_dataset=test_dataset,
            data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
            compute_metrics=eval_predict
        )
        metrics = trainer.evaluate()
        test_results.append(metrics)
    average_metrics = {key: np.mean([result[key] for result in test_results]) for key in test_results[0].keys()}
    return average_metrics

However, when I did full-parameter supervised fine-tuning without using peft, the final results on the independent test set were all normal. I changed different tasks, changed different peft methods, changed different parts of fine-tuning, and used the latest version of peft and still can't solve the problem.

Expected behavior

Find out the cause and fix the problem

@githubnemo
Copy link
Collaborator

githubnemo commented Jan 21, 2025

Hey, thanks for reporting the issue.

I could not reproduce this behavior with a simpler example. Can you try to modify my code to reproduce the behavior you're experiencing?

Here's the code I used to test saving/loading the model:

from peft import LoraConfig
from peft import get_peft_model, TaskType, PeftModel
from transformers import AutoModelForSequenceClassification, BertConfig
import sys
import torch

config = BertConfig.from_pretrained("zhihan1996/DNABERT-2-117M")
base_model = AutoModelForSequenceClassification.from_pretrained(
    'zhihan1996/DNABERT-2-117M', config=config, trust_remote_code=True)

if sys.argv[1] == 'train':
    lora_config = LoraConfig(r=3, target_modules='all-linear', task_type=TaskType.SEQ_CLS)
    peft_model = get_peft_model(base_model, lora_config)

    # simulate training by modifying classifier weights
    peft_model.classifier.weight.data = torch.ones_like(peft_model.classifier.weight)
    peft_model.save_pretrained('peft_model')
else:
    peft_model = PeftModel.from_pretrained(model=base_model, model_id='./peft_model', trust_remote_code=True)
    print(peft_model.classifier.weight.data)

You can run this script using python example.py train to create the adapter model and then python example.py test to test the loading behavior. Expected is that the modified classifier weights are shown to be all ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants