Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training script for the Bert-based model on the NLI dataset #85

Open
Hominnn opened this issue Jun 28, 2024 · 2 comments
Open

Training script for the Bert-based model on the NLI dataset #85

Hominnn opened this issue Jun 28, 2024 · 2 comments

Comments

@Hominnn
Copy link

Hominnn commented Jun 28, 2024

Dear author, I want to use bert-base-uncased model to train on NLI dataset based on your method for some research. Could you provide relevant training scripts so that I can better reproduce your experimental results? This is my training script, using the same data as your training. I cannot reproduce the evaluation effect of your angle-bert-base-uncased-nli-en-v1 model.

CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 --master_port=1234 train_nli.py \
--task NLI-STS --output_dir ckpts/NLI-STS-bert-cls \
--model_name_or_path ../models/bert-base-uncased \
--learning_rate 5e-5 --maxlen 50 \
--epochs 1 \
--batch_size 10 \
--logging_steps 500 \
--warmup_steps 0 \
--save_steps 1000 --seed 42 --do_eval 0 --gradient_accumulation_steps 4 --fp16 1 --torch_dtype 'float32' \
--pooling_strategy 'cls'

This is my evalution result on STS
image

@SeanLee97
Copy link
Owner

hello @Hominnn, the training code train_nli.py is too old. It is recommended to use angle-trainer now.

I've updated the NLI document: https://github.com/SeanLee97/AnglE/blob/main/examples/NLI/README.md#41-bert
You can find the new training script in the document.

To run it successfully,

  1. please upgrade the angle-emb to the latest version via python -m pip install -U angle-emb

  2. please use the latest evaluation code: https://github.com/SeanLee97/AnglE/blob/main/examples/NLI/eval_nli.py

  3. if you want to push your model to huggingface, please set --push_to_hub 1 and specify a model id in your space via --hub_model_id xxx. If not, set --push_to_hub 0.

Here are the intermediate results (in about 9 epochs) of my run:

+-------+-------+-------+-------+-------+--------------+-----------------+-------+
| STS12 | STS13 | STS14 | STS15 | STS16 | STSBenchmark | SICKRelatedness |  Avg. |
+-------+-------+-------+-------+-------+--------------+-----------------+-------+
| 75.59 | 84.83 | 80.37 | 86.26 | 81.96 |    85.12     |      80.70      | 82.12 |
+-------+-------+-------+-------+-------+--------------+-----------------+-------+

You can try to increase the epoch, ibn_w, or gradient_accumulation_steps for better results.

I am still training several models with different hyperparameters; I will let you know the better hyperparameters when they are done.

@Hominnn
Copy link
Author

Hominnn commented Jun 28, 2024

hello @Hominnn, the training code train_nli.py is too old. It is recommended to use angle-trainer now.

I've updated the NLI document: https://github.com/SeanLee97/AnglE/blob/main/examples/NLI/README.md#41-bert You can find the new training script in the document.

To run it successfully,

  1. please upgrade the angle-emb to the latest version via python -m pip install -U angle-emb
  2. please use the latest evaluation code: https://github.com/SeanLee97/AnglE/blob/main/examples/NLI/eval_nli.py
  3. if you want to push your model to huggingface, please set --push_to_hub 1 and specify a model id in your space via --hub_model_id xxx. If not, set --push_to_hub 0.

Here are the intermediate results (in about 9 epochs) of my run:

+-------+-------+-------+-------+-------+--------------+-----------------+-------+
| STS12 | STS13 | STS14 | STS15 | STS16 | STSBenchmark | SICKRelatedness |  Avg. |
+-------+-------+-------+-------+-------+--------------+-----------------+-------+
| 75.59 | 84.83 | 80.37 | 86.26 | 81.96 |    85.12     |      80.70      | 82.12 |
+-------+-------+-------+-------+-------+--------------+-----------------+-------+

You can try to increase the epoch, ibn_w, or gradient_accumulation_steps for better results.

I am still training several models with different hyperparameters; I will let you know the better hyperparameters when they are done.

Thank you for your serious reply. Looking forwarding to more of your meaningful work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants