Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GLM4的微调代码似乎evaluation时未做优化 #582

Open
2 tasks
FloatFrank opened this issue Oct 11, 2024 · 0 comments
Open
2 tasks

GLM4的微调代码似乎evaluation时未做优化 #582

FloatFrank opened this issue Oct 11, 2024 · 0 comments

Comments

@FloatFrank
Copy link

System Info / 系統信息

最新版LoRA微调,除了batch其他参数默认,evaluation的batch_size设置1,训练batch_size也是1,训练哪怕是2以上都正常,但是evaluation时就会内存分页大小问题。是否需要train->GC->eval->train->...?
2080Ti 22G,实在是没有足够的显存可以train的基础上再加入evaluation

报错(故意调成10来eval,复现报错):
{'loss': 2.4602, 'grad_norm': 6.9103498458862305, 'learning_rate': 0.0004991666666666666, 'epoch': 0.0}
0%|▎ | 10/6000 [01:03<2:28:24, 1.49s/it]
***** Running Evaluation *****
Num examples = 600
Batch size = 1
Traceback (most recent call last):
File "", line 1, in
File "E:\Conda Envs\glm-4-demo\Lib\multiprocessing\spawn.py", line 122, in spawn_main
exitcode = _main(fd, parent_sentinel)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\Conda Envs\glm-4-demo\Lib\multiprocessing\spawn.py", line 131, in _main
prepare(preparation_data)
File "E:\Conda Envs\glm-4-demo\Lib\multiprocessing\spawn.py", line 246, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "E:\Conda Envs\glm-4-demo\Lib\multiprocessing\spawn.py", line 297, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 286, in run_path
File "", line 98, in _run_module_code
File "", line 88, in run_code
File "...\fine_tune\glm_lora_change\GLM-4-MineTuningVersion\finetune_demo\finetune.py", line 11, in
import torch
File "E:\Conda Envs\glm-4-demo\Lib\site-packages\torch_init
.py", line 137, in
raise err
OSError: [WinError 1455] 页面文件太小,无法完成操作。 Error loading "E:\Conda Envs\glm-4-demo\Lib\site-packages\torch\lib\cublas64_12.dll" or one of its dependencies.

Who can help? / 谁可以帮助到您?

No response

Information / 问题信息

  • The official example scripts / 官方的示例脚本
  • My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

python .\finetune.py data//nov_glm_datasets THUDM/glm-4-9b-chat configs/lora.yaml

Expected behavior / 期待表现

期待正常微调,eval时不额外加载模型推理

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant