Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run error #4

Open
tanshuai0219 opened this issue Jul 8, 2024 · 4 comments
Open

run error #4

tanshuai0219 opened this issue Jul 8, 2024 · 4 comments

Comments

@tanshuai0219
Copy link

torchrun --standalone --nproc_per_node 1 scripts/inference.py --config configs/mvdit/inference/16x512x512.py /root/miniconda3/lib/python3.10/site-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. _torch_pytree._register_pytree_node( /root/miniconda3/lib/python3.10/site-packages/colossalai/pipeline/schedule/_utils.py:19: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. _register_pytree_node(OrderedDict, _odict_flatten, _odict_unflatten) /root/miniconda3/lib/python3.10/site-packages/torch/utils/_pytree.py:254: UserWarning: <class 'collections.OrderedDict'> is already registered as pytree node. Overwriting the previous registration. warnings.warn( [2024-07-08 11:13:11,432] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) /root/miniconda3/lib/python3.10/site-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. _torch_pytree._register_pytree_node( /root/miniconda3/lib/python3.10/site-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. _torch_pytree._register_pytree_node( /root/miniconda3/lib/python3.10/site-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. _torch_pytree._register_pytree_node( /mine_workspace/Mirage/repos/diffusers/src/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. torch.utils._pytree._register_pytree_node( Config (path: configs/mvdit/inference/16x512x512.py): {'num_frames': 16, 'fps': 8, 'image_size': (512, 512), 'model': {'type': 'MVDiT-XL/2', 'space_scale': 1.0, 'time_scale': 1.0, 'enable_flashattn': True, 'enable_layernorm_kernel': True, 'from_pretrained': '/mnt_alipayshnas/youtai.ts/checkpoints/OpenVid/MVDiT-16×512×512.pt'}, 'vae': {'type': 'VideoAutoencoderKL', 'from_pretrained': '/mnt_alipayshnas/youtai.ts/checkpoints/sd-vae-ft-ema/stabilityai__sd-vae-ft-ema', 'micro_batch_size': 2}, 'text_encoder': {'type': 't5', 'from_pretrained': '/mnt_alipayshnas/youtai.ts/checkpoints/t5-v1_1-xxl/t5-v1_1-xxl', 'model_max_length': 120}, 'scheduler': {'type': 'iddpm', 'num_sampling_steps': 100, 'cfg_scale': 7.0}, 'dtype': 'fp16', 'batch_size': 2, 'seed': 42, 'prompt_path': './assets/texts/evalcrafter.txt', 'start_idx': 0, 'end_idx': 700, 'save_dir': './outputs/samples/', 'multi_resolution': False} Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████| 2/2 [04:23<00:00, 131.92s/it] Loading /mnt_alipayshnas/youtai.ts/checkpoints/OpenVid/MVDiT-16×512×512.pt Missing keys: ['pos_embed', 'pos_embed_temporal'] Unexpected keys: [] /ossfs/workspace/py310/workspace/OpenVid-1M/openvid/models/text_encoder/t5.py:163: MarkupResemblesLocatorWarning: The input looks more like a filename than markup. You may want to open this file and pass the filehandle into Beautiful Soup. caption = BeautifulSoup(caption, features="html.parser").text 0%| | 0/100 [00:00<?, ?it/s] Traceback (most recent call last): File "/ossfs/workspace/py310/workspace/OpenVid-1M/scripts/inference.py", line 107, in <module> main() File "/ossfs/workspace/py310/workspace/OpenVid-1M/scripts/inference.py", line 87, in main samples = scheduler.sample( File "/ossfs/workspace/py310/workspace/OpenVid-1M/openvid/schedulers/iddpm/__init__.py", line 77, in sample samples = self.p_sample_loop( File "/ossfs/workspace/py310/workspace/OpenVid-1M/openvid/schedulers/iddpm/gaussian_diffusion.py", line 437, in p_sample_loop for sample in self.p_sample_loop_progressive( File "/ossfs/workspace/py310/workspace/OpenVid-1M/openvid/schedulers/iddpm/gaussian_diffusion.py", line 488, in p_sample_loop_progressive out = self.p_sample( File "/ossfs/workspace/py310/workspace/OpenVid-1M/openvid/schedulers/iddpm/gaussian_diffusion.py", line 391, in p_sample out = self.p_mean_variance( File "/ossfs/workspace/py310/workspace/OpenVid-1M/openvid/schedulers/iddpm/respace.py", line 94, in p_mean_variance return super().p_mean_variance(self._wrap_model(model), *args, **kwargs) File "/ossfs/workspace/py310/workspace/OpenVid-1M/openvid/schedulers/iddpm/gaussian_diffusion.py", line 270, in p_mean_variance model_output = model(x, t, **model_kwargs) File "/ossfs/workspace/py310/workspace/OpenVid-1M/openvid/schedulers/iddpm/respace.py", line 127, in __call__ return self.model(x, new_ts, **kwargs) File "/ossfs/workspace/py310/workspace/OpenVid-1M/openvid/schedulers/iddpm/__init__.py", line 94, in forward_with_cfg model_out = model.forward(combined, timestep, y, **kwargs) File "/ossfs/workspace/py310/workspace/OpenVid-1M/openvid/models/mvdit/mvdit.py", line 331, in forward x, y = auto_grad_checkpoint(block, x, y, t0, t_y, t0_tmep, t_y_tmep, mask, tpe) File "/ossfs/workspace/py310/workspace/OpenVid-1M/openvid/acceleration/checkpoint.py", line 24, in auto_grad_checkpoint return module(*args, **kwargs) File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/ossfs/workspace/py310/workspace/OpenVid-1M/openvid/models/mvdit/mvdit.py", line 162, in forward x = x + self.cross_attn(x, y, mask) File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/ossfs/workspace/py310/workspace/OpenVid-1M/openvid/models/layers/blocks.py", line 399, in forward attn_bias[attn_bias==0] = exp RuntimeError: value cannot be converted to type at::Half without overflow

I get error when running mvdit, could u give some advice? But when I run stdit, it successfully generate videos.

@CSRuiXie
Copy link

CSRuiXie commented Jul 8, 2024

Thank you for your attention to our work. You can set the dtype in the config to fp32, and then it should work.

@madhuvanthp
Copy link

Thank you for your attention to our work. You can set the dtype in the config to fp32, and then it should work.

inference.py: error: unrecognized arguments: configs/mvdit/inference/16x512x512.py
E0708 19:42:19.365000 139992540640320 torch/distributed/elastic/multiprocessing/api.py:826] failed (exitcode: 2)

I still get this error when changing the dtype in the config file you specified to fp32.

@raccooncoder
Copy link

I switched dtype to bf16 and it worked; makes sense, because the model was trained in bf16.

@SamitM1
Copy link

SamitM1 commented Jul 11, 2024

I switched dtype to bf16 and it worked; makes sense, because the model was trained in bf16.

How did you install apex? The command provided in this repo gives the following error for me:


  subprocess.CalledProcessError: Command '['which', 'g++']' returned non-zero exit status 1.
  error: subprocess-exited-with-error

  × Building wheel for apex (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> See above for output.

  note: This error originates from a subprocess, and is likely not a problem with pip.
  full command: /home/samit/anaconda3/envs/openvid/bin/python /home/samit/anaconda3/envs/openvid/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py build_wheel /tmp/tmpujlo36a8
  cwd: /tmp/pip-req-build-5l72eosr
  Building wheel for apex (pyproject.toml) ... error
  ERROR: Failed building wheel for apex
Failed to build apex
ERROR: Could not build wheels for apex, which is required to install pyproject.toml-based projects

Would greatly appreciate some help trouble shooting this.
If you used a different command to install apex could you please provide it @tanshuai0219 @raccooncoder

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants