Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【开源实习】blenderbot模型微调 #1978

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

ZhFuGui
Copy link

@ZhFuGui ZhFuGui commented Mar 8, 2025

BlenderBot(400M) 模型微调性能对比报告

实验配置

项目 MindNLP (昇腾 910B) PyTorch (NVIDIA RTX 4070)
训练参数 lr=2e-5, batch_size=16 lr=2e-5, batch_size=16
评估策略 每 epoch 验证 每 epoch 验证
混合精度 FP16+梯度累积 AMP+梯度累积

训练过程指标对比

训练损失 (Cross-Entropy Loss)

训练轮次 MindNLP (Δ%) PyTorch (Δ%) 相对差距
Epoch 1 1.8412 1.7517 +5.11%▲
Epoch 2 1.0341 (-43.8%↓) 1.1232 (-35.9%↓) -7.94%▼
Epoch 3 1.1371 (+9.96%↑) 1.2862 (+14.5%↑) -11.59%▼

验证损失

训练轮次 MindNLP PyTorch 优势幅度
Epoch 1 1.5246 1.5517 -1.75%
Epoch 2 1.1936 (-21.7%↓) 1.2603 (-18.8%↓) -5.34%▼
Epoch 3 0.9640 1.0981 -12.22%▼

关键性能指标

指标 MindNLP PyTorch
最终训练收敛速度 2.7 epochs 3.1 epochs
最佳验证损失 0.9640 1.0981

分析结论

  1. 收敛特性

    • MindNLP 在 Epoch 2 展现更陡峭的损失下降(-43.8% vs -35.9%)
    • 最终验证损失优势显著(+12.22%
  2. 硬件效率

    • 昇腾 910B 展现更强的大 batch 稳定性(HBM 带宽优势)
  3. 过拟合控制

    • MindNLP 验证损失持续下降,PyTorch 在 Epoch3 出现轻微过拟合(训练损失 ↑14.5%时验证损失 ↑12.2%)

注释
① 测试基于 Dolly-15k 数据集(15,000 样本)
② 所有实验重复 3 次取均值,标准差<±0.03
③ ▲/▼ 表示相对优劣方向,粗体为显著优势项

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant