You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[2025-01-15` 20:35:09,394] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.
[W socket.cpp:697] [c10d] The client socket has failed to connect to [PC-222]:25678 (system error: 10049 - �����������У�������ĵ�ַ��Ч��).
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.2.2+cu121 with CUDA 1201 (you have 2.2.2+cu121)
Python 3.10.11 (you have 3.10.13)
Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
Memory-efficient attention, SwiGLU, sparse and more won't be available.
Set XFORMERS_MORE_DETAILS=1 for more details
Traceback (most recent call last):
File "D:\ProgramData\anaconda3\envs\latentsync\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "D:\ProgramData\anaconda3\envs\latentsync\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "D:\workspace\LatentSync\LatentSync\scripts\train_unet.py", line 32, in <module>
import diffusers
File "D:\ProgramData\anaconda3\envs\latentsync\lib\site-packages\diffusers\__init__.py", line 28, in <module>
from .models import (
File "D:\ProgramData\anaconda3\envs\latentsync\lib\site-packages\diffusers\models\__init__.py", line 19, in <module>
from .attention import Transformer2DModel
File "D:\ProgramData\anaconda3\envs\latentsync\lib\site-packages\diffusers\models\attention.py", line 43, in <module>
import xformers.ops
File "D:\ProgramData\anaconda3\envs\latentsync\lib\site-packages\xformers\ops\__init__.py", line 8, in <module>
from .fmha import (
File "D:\ProgramData\anaconda3\envs\latentsync\lib\site-packages\xformers\ops\fmha\__init__.py", line 10, in <module>
from . import (
File "D:\ProgramData\anaconda3\envs\latentsync\lib\site-packages\xformers\ops\fmha\triton_splitk.py", line 548, in <module>
_get_splitk_kernel(num_groups)
File "D:\ProgramData\anaconda3\envs\latentsync\lib\site-packages\xformers\ops\fmha\triton_splitk.py", line 503, in _get_splitk_kernel
_fwd_kernel_splitK_unrolled = unroll_varargs(_fwd_kernel_splitK, N=num_groups)
File "D:\ProgramData\anaconda3\envs\latentsync\lib\site-packages\xformers\triton\vararg_kernel.py", line 166, in unroll_varargs
jitted_fn = triton.jit(fn)
File "D:\ProgramData\anaconda3\envs\latentsync\lib\site-packages\triton\runtime\jit.py", line 882, in jit
return decorator(fn)
File "D:\ProgramData\anaconda3\envs\latentsync\lib\site-packages\triton\runtime\jit.py", line 871, in decorator
return JITFunction(
File "D:\ProgramData\anaconda3\envs\latentsync\lib\site-packages\triton\runtime\jit.py", line 717, in __init__
self.src = self.src[re.search(r"^def\s+\w+\s*\(", self.src, re.MULTILINE).start():]
AttributeError: 'NoneType' object has no attribute 'start'
[2025-01-15 20:35:14,444] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 39356) of binary: D:\ProgramData\anaconda3\envs\latentsync\python.exe
Traceback (most recent call last):
File "D:\ProgramData\anaconda3\envs\latentsync\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "D:\ProgramData\anaconda3\envs\latentsync\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "D:\ProgramData\anaconda3\envs\latentsync\Scripts\torchrun.exe\__main__.py", line 7, in <module>
File "D:\ProgramData\anaconda3\envs\latentsync\lib\site-packages\torch\distributed\elastic\multiprocessing\errors\__init__.py", line 347, in wrapper
return f(*args, **kwargs)
File "D:\ProgramData\anaconda3\envs\latentsync\lib\site-packages\torch\distributed\run.py", line 812, in main
run(args)
File "D:\ProgramData\anaconda3\envs\latentsync\lib\site-packages\torch\distributed\run.py", line 803, in run
elastic_launch(
File "D:\ProgramData\anaconda3\envs\latentsync\lib\site-packages\torch\distributed\launcher\api.py", line 135, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "D:\ProgramData\anaconda3\envs\latentsync\lib\site-packages\torch\distributed\launcher\api.py", line 268, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
scripts.train_unet FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2025-01-15_20:35:14
host : PC-222
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 39356)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
The text was updated successfully, but these errors were encountered:
The text was updated successfully, but these errors were encountered: