You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
报错(故意调成10来eval,复现报错):
{'loss': 2.4602, 'grad_norm': 6.9103498458862305, 'learning_rate': 0.0004991666666666666, 'epoch': 0.0}
0%|▎ | 10/6000 [01:03<2:28:24, 1.49s/it]
***** Running Evaluation *****
Num examples = 600
Batch size = 1
Traceback (most recent call last):
File "", line 1, in
File "E:\Conda Envs\glm-4-demo\Lib\multiprocessing\spawn.py", line 122, in spawn_main
exitcode = _main(fd, parent_sentinel)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\Conda Envs\glm-4-demo\Lib\multiprocessing\spawn.py", line 131, in _main
prepare(preparation_data)
File "E:\Conda Envs\glm-4-demo\Lib\multiprocessing\spawn.py", line 246, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "E:\Conda Envs\glm-4-demo\Lib\multiprocessing\spawn.py", line 297, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 286, in run_path
File "", line 98, in _run_module_code
File "", line 88, in run_code
File "...\fine_tune\glm_lora_change\GLM-4-MineTuningVersion\finetune_demo\finetune.py", line 11, in
import torch
File "E:\Conda Envs\glm-4-demo\Lib\site-packages\torch_init.py", line 137, in
raise err
OSError: [WinError 1455] 页面文件太小,无法完成操作。 Error loading "E:\Conda Envs\glm-4-demo\Lib\site-packages\torch\lib\cublas64_12.dll" or one of its dependencies.
System Info / 系統信息
最新版LoRA微调,除了batch其他参数默认,evaluation的batch_size设置1,训练batch_size也是1,训练哪怕是2以上都正常,但是evaluation时就会内存分页大小问题。是否需要train->GC->eval->train->...?
2080Ti 22G,实在是没有足够的显存可以train的基础上再加入evaluation
报错(故意调成10来eval,复现报错):
{'loss': 2.4602, 'grad_norm': 6.9103498458862305, 'learning_rate': 0.0004991666666666666, 'epoch': 0.0}
0%|▎ | 10/6000 [01:03<2:28:24, 1.49s/it]
***** Running Evaluation *****
Num examples = 600
Batch size = 1
Traceback (most recent call last):
File "", line 1, in
File "E:\Conda Envs\glm-4-demo\Lib\multiprocessing\spawn.py", line 122, in spawn_main
exitcode = _main(fd, parent_sentinel)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\Conda Envs\glm-4-demo\Lib\multiprocessing\spawn.py", line 131, in _main
prepare(preparation_data)
File "E:\Conda Envs\glm-4-demo\Lib\multiprocessing\spawn.py", line 246, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "E:\Conda Envs\glm-4-demo\Lib\multiprocessing\spawn.py", line 297, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 286, in run_path
File "", line 98, in _run_module_code
File "", line 88, in run_code
File "...\fine_tune\glm_lora_change\GLM-4-MineTuningVersion\finetune_demo\finetune.py", line 11, in
import torch
File "E:\Conda Envs\glm-4-demo\Lib\site-packages\torch_init.py", line 137, in
raise err
OSError: [WinError 1455] 页面文件太小,无法完成操作。 Error loading "E:\Conda Envs\glm-4-demo\Lib\site-packages\torch\lib\cublas64_12.dll" or one of its dependencies.
Who can help? / 谁可以帮助到您?
No response
Information / 问题信息
Reproduction / 复现过程
python .\finetune.py data//nov_glm_datasets THUDM/glm-4-9b-chat configs/lora.yaml
Expected behavior / 期待表现
期待正常微调,eval时不额外加载模型推理
The text was updated successfully, but these errors were encountered: