-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
although folder preparation button is resumed, image folder still does not work #2245
Comments
seeming the same errors like #2244 |
Not sure what cause this error… it is caused by the training script and I don’t think I can do anything about it. You might want to open an issue directly on the as-scripts repo. |
okay |
File "/kaggle/working/kohya_ss/sd-scripts/sdxl_train_network.py", line 185, in
|
Same error, launched |
It was a stupid mistake, I forgot to change the script name to |
see, I do not know what it is wrong
Traceback (most recent call last):
File "/kaggle/working/kohya_ss/sd-scripts/train_network.py", line 1115, in
trainer.train(args)
File "/kaggle/working/kohya_ss/sd-scripts/train_network.py", line 234, in train
model_version, text_encoder, vae, unet = self.load_target_model(args, weight_dtype, accelerator)
File "/kaggle/working/kohya_ss/sd-scripts/train_network.py", line 101, in load_target_model
text_encoder, vae, unet, _ = train_util.load_target_model(args, weight_dtype, accelerator)
File "/kaggle/working/kohya_ss/sd-scripts/library/train_util.py", line 4387, in load_target_model
text_encoder, vae, unet, load_stable_diffusion_format = _load_target_model(
File "/kaggle/working/kohya_ss/sd-scripts/library/train_util.py", line 4363, in _load_target_model
original_unet = UNet2DConditionModel(
File "/kaggle/working/kohya_ss/sd-scripts/library/original_unet.py", line 1427, in init
attn_num_head_channels=attention_head_dim[i],
IndexError: list index out of range
[2024-04-09 05:08:36,532] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 1114) of binary: /opt/conda/bin/python
Traceback (most recent call last):
File "/opt/conda/bin/accelerate", line 8, in
sys.exit(main())
File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
args.func(args)
File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1008, in launch_command
multi_gpu_launcher(args)
File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/launch.py", line 666, in multi_gpu_launcher
distrib_run.run(args)
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/run.py", line 797, in run
elastic_launch(
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
/kaggle/working/kohya_ss/sd-scripts/train_network.py FAILED
Failures:
<NO_OTHER_FAILURES>
Root Cause (first observed failure):
[0]:
time : 2024-04-09_05:08:36
host : c9fb5313e204
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 1114)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
The text was updated successfully, but these errors were encountered: