-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error running the demo #11
Comments
Did you correctly build deformable detr with the following command: # build deformable detr
cd models/aios/ops
python setup.py build install
cd ../../.. |
Thanks for your reply @WYJSJTU . I did build deformable detr correctly. I'm thinking that maybe the error is the version of the cuda library. Any other suggestions ? |
@WYJSJTU I am also getting similar error in dataloder.
|
I'm also getting a similar error. aios_smplx
|
It seems like your |
It might be an issue of the video path, be sure to put the video you want to run under |
Hi! I'm having some trouble with the demo. I installed all the required libraries, except for the version of cuda which is 11.1. I'm getting the following error.
Before torch.distributed.barrier()
End torch.distributed.barrier()
Loading config file from config/aios_smplx_inference.py
[05/22 14:34:28.837]: git:
sha: be1ea5a, status: has uncommited changes, branch: main
[05/22 14:34:28.837]: Command: main.py -c config/aios_smplx_inference.py --options batch_size=8 epochs=100 lr_drop=55 num_body_points=17 backbone=resnet50 --resume data/checkpoint/aios_checkpoint.pth --eval --inference --to_vid --inference_input demo/short_video.mp4 --output_dir demo/demo
[05/22 14:34:28.839]: Full config saved to demo/demo/config_args_all.json
[05/22 14:34:28.839]: world size: 1
[05/22 14:34:28.839]: rank: 0
[05/22 14:34:28.839]: local_rank: 0
[05/22 14:34:28.839]: args: Namespace(agora_benchmark='na', amp=False, aux_loss=True, backbone='resnet50', backbone_freeze_keywords=None, batch_norm_type='FrozenBatchNorm2d', batch_size=8, bbox_loss_coef=5.0, bbox_ratio=1.2, body_3d_size=2, body_bbox_loss_coef=5.0, body_giou_loss_coef=2.0, body_model_test={'type': 'smplx', 'keypoint_src': 'smplx', 'num_expression_coeffs': 10, 'num_betas': 10, 'keypoint_dst': 'smplx_137', 'model_path': 'data/body_models/smplx', 'use_pca': False, 'use_face_contour': True}, body_model_train={'type': 'smplx', 'keypoint_src': 'smplx', 'num_expression_coeffs': 10, 'num_betas': 10, 'keypoint_dst': 'smplx_137', 'model_path': 'data/body_models/smplx', 'use_pca': False, 'use_face_contour': True}, body_only=True, camera_3d_size=2.5, clip_max_norm=0.1, cls_loss_coef=2.0, cls_no_bias=False, code_dir=None, config_file='config/aios_smplx_inference.py', config_path='config/aios_smplx.py', continue_train=True, cur_dir='/home/seba/Documents/AiOS/config', data_dir='/home/seba/Documents/AiOS/config/../dataset', data_strategy='balance', dataset_list=['AGORA_MM', 'BEDLAM', 'COCO_NA'], ddetr_lr_param=False, debug=False, dec_layer_number=None, dec_layers=6, dec_n_points=4, dec_pred_bbox_embed_share=False, dec_pred_class_embed_share=False, dec_pred_pose_embed_share=False, decoder_module_seq=['sa', 'ca', 'ffn'], decoder_sa_type='sa', device='cuda', dilation=False, dim_feedforward=2048, distributed=True, dln_hw_noise=0.2, dln_xy_noise=0.2, dn_attn_mask_type_list=['match2dn', 'dn2dn', 'group2group'], dn_batch_gt_fuse=False, dn_bbox_coef=0.5, dn_box_noise_scale=0.4, dn_label_coef=0.3, dn_label_noise_ratio=0.5, dn_labelbook_size=100, dn_number=100, dropout=0.0, ema_decay=0.9997, ema_epoch=0, embed_init_tgt=False, enc_layers=6, enc_loss_coef=1.0, enc_n_points=4, end_epoch=150, epochs=100, eval=True, exp_name='output/exp52/dataset_debug', face_3d_size=0.3, face_bbox_loss_coef=5.0, face_giou_loss_coef=2.0, face_keypoints_loss_coef=10.0, face_oks_loss_coef=4.0, find_unused_params=False, finetune_ignore=None, fix_refpoints_hw=-1, focal=(5000, 5000), focal_alpha=0.25, frozen_weights=None, gamma=0.1, giou_loss_coef=2.0, gpu=0, hand_3d_size=0.3, hidden_dim=256, human_model_path='data/body_models', indices_idx_list=[1, 2, 3, 4, 5, 6, 7], inference=True, inference_input='demo/short_video.mp4', input_body_shape=(256, 192), input_face_shape=(192, 192), input_hand_shape=(256, 256), interm_loss_coef=1.0, keypoints_loss_coef=10.0, lhand_bbox_loss_coef=5.0, lhand_giou_loss_coef=2.0, lhand_keypoints_loss_coef=10.0, lhand_oks_loss_coef=0.5, local_rank=0, log_dir=None, losses=['smpl_pose', 'smpl_beta', 'smpl_expr', 'smpl_kp2d', 'smpl_kp3d', 'smpl_kp3d_ra', 'labels', 'boxes', 'keypoints'], lr=1.414e-05, lr_backbone=1.414e-06, lr_backbone_names=['backbone.0'], lr_drop=55, lr_drop_list=[30, 60], lr_linear_proj_mult=0.1, lr_linear_proj_names=['reference_points', 'sampling_offsets'], make_same_len=False, masks=False, match_unstable_error=False, matcher_type='HungarianMatcher', model_dir=None, modelname='aios_smplx', multi_step_lr=True, nheads=8, nms_iou_threshold=-1, no_aug=False, no_interm_box_loss=False, no_mmpose_keypoint_evaluator=True, num_body_points=17, num_box_decoder_layers=2, num_classes=2, num_face_points=6, num_feature_levels=4, num_group=100, num_hand_face_decoder_layers=4, num_hand_points=6, num_patterns=0, num_queries=900, num_select=50, num_workers=0, oks_loss_coef=4.0, onecyclelr=False, options={'batch_size': 8, 'epochs': 100, 'lr_drop': 55, 'num_body_points': 17, 'backbone': 'resnet50'}, output_dir='demo/demo', output_face_hm_shape=(8, 8, 8), output_hand_hm_shape=(16, 16, 16), output_hm_shape=(16, 16, 12), param_dict_type='default', pe_temperatureH=20, pe_temperatureW=20, position_embedding='sine', pre_norm=False, pretrain_model_path=None, pretrained_model_path='../output/train_gta_synbody_ft_20230410_132110/model_dump/snapshot_2.pth.tar', princpt=(96.0, 128.0), query_dim=4, random_refpoints_xy=False, rank=0, result_dir='/home/seba/Documents/AiOS/config/../exps62/result', resume='data/checkpoint/aios_checkpoint.pth', return_interm_indices=[1, 2, 3], rhand_bbox_loss_coef=5.0, rhand_giou_loss_coef=2.0, rhand_keypoints_loss_coef=10.0, rhand_oks_loss_coef=0.5, rm_detach=None, rm_self_attn_layers=None, root_dir='/home/seba/Documents/AiOS/config/..', save_checkpoint_interval=1, save_log=False, scheduler='step', seed=42, set_cost_bbox=5.0, set_cost_class=2.0, set_cost_giou=2.0, set_cost_keypoints=10.0, set_cost_kpvis=0.0, set_cost_oks=4.0, smpl_beta_loss_coef=0.01, smpl_body_kp2d_ba_loss_coef=0.0, smpl_body_kp2d_loss_coef=1.0, smpl_body_kp3d_loss_coef=1.0, smpl_body_kp3d_ra_loss_coef=1.0, smpl_expr_loss_coef=0.01, smpl_face_kp2d_ba_loss_coef=0.0, smpl_face_kp2d_loss_coef=0.1, smpl_face_kp3d_loss_coef=0.1, smpl_face_kp3d_ra_loss_coef=0.1, smpl_lhand_kp2d_ba_loss_coef=0.0, smpl_lhand_kp2d_loss_coef=0.5, smpl_lhand_kp3d_loss_coef=0.1, smpl_lhand_kp3d_ra_loss_coef=0.1, smpl_pose_loss_body_coef=0.1, smpl_pose_loss_jaw_coef=0.1, smpl_pose_loss_lhand_coef=0.1, smpl_pose_loss_rhand_coef=0.1, smpl_pose_loss_root_coef=1.0, smpl_rhand_kp2d_ba_loss_coef=0.0, smpl_rhand_kp2d_loss_coef=0.5, smpl_rhand_kp3d_loss_coef=0.1, smpl_rhand_kp3d_ra_loss_coef=0.1, start_epoch=0, step_size=20, strong_aug=False, test=False, test_max_size=1333, test_sample_interval=100, test_sizes=[800], testset='INFERENCE', to_vid=True, total_data_len='auto', train_batch_size=32, train_max_size=1333, train_sample_interval=10, train_sizes=[480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800], trainset_2d=[], trainset_3d=['AGORA_MM', 'BEDLAM', 'COCO_NA'], trainset_humandata=[], trainset_partition={'AGORA_MM': 0.4, 'BEDLAM': 0.7, 'COCO_NA': 1}, transformer_activation='relu', two_stage_bbox_embed_share=False, two_stage_class_embed_share=False, two_stage_default_hw=0.05, two_stage_keep_all_tokens=False, two_stage_learn_wh=False, two_stage_type='standard', use_cache=True, use_checkpoint=False, use_dn=True, use_ema=True, vis_dir=None, weight_decay=0.0001, world_size=1)
aios_smplx
Traceback (most recent call last):
File "main.py", line 437, in
main(args)
File "main.py", line 173, in main
model, criterion, postprocessors, postprocessors_aios = build_model_main(
File "main.py", line 86, in build_model_main
from models.registry import MODULE_BUILD_FUNCS
File "/home/seba/Documents/AiOS/models/init.py", line 1, in
from .aios import build_aios_smplx
File "/home/seba/Documents/AiOS/models/aios/init.py", line 1, in
from .aios_smplx import build_aios_smplx
File "/home/seba/Documents/AiOS/models/aios/aios_smplx.py", line 17, in
from .transformer import build_transformer
File "/home/seba/Documents/AiOS/models/aios/transformer.py", line 10, in
from .transformer_deformable import DeformableTransformerEncoderLayer, DeformableTransformerDecoderLayer
File "/home/seba/Documents/AiOS/models/aios/transformer_deformable.py", line 11, in
from .ops.modules import MSDeformAttn
File "/home/seba/Documents/AiOS/models/aios/ops/modules/init.py", line 9, in
from .ms_deform_attn import MSDeformAttn
File "/home/seba/Documents/AiOS/models/aios/ops/modules/ms_deform_attn.py", line 21, in
from ..functions import MSDeformAttnFunction
File "/home/seba/Documents/AiOS/models/aios/ops/functions/init.py", line 9, in
from .ms_deform_attn_func import MSDeformAttnFunction
File "/home/seba/Documents/AiOS/models/aios/ops/functions/ms_deform_attn_func.py", line 18, in
import MultiScaleDeformableAttention as MSDA
ImportError: /home/seba/anaconda3/envs/aios/lib/python3.8/site-packages/MultiScaleDeformableAttention-1.0-py3.8-linux-x86_64.egg/MultiScaleDeformableAttention.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN3c1015SmallVectorBaseIjE8grow_podEPvmm
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 4686) of binary: /home/seba/anaconda3/envs/aios/bin/python
Traceback (most recent call last):
File "/home/seba/anaconda3/envs/aios/bin/torchrun", line 8, in
sys.exit(main())
File "/home/seba/anaconda3/envs/aios/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 345, in wrapper
return f(*args, **kwargs)
File "/home/seba/anaconda3/envs/aios/lib/python3.8/site-packages/torch/distributed/run.py", line 719, in main
run(args)
File "/home/seba/anaconda3/envs/aios/lib/python3.8/site-packages/torch/distributed/run.py", line 710, in run
elastic_launch(
File "/home/seba/anaconda3/envs/aios/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/seba/anaconda3/envs/aios/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 259, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
main.py FAILED
Failures:
<NO_OTHER_FAILURES>
Root Cause (first observed failure):
[0]:
time : 2024-05-22_14:34:30
host : seba-GE66-Raider-10UH
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 4686)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
Thanks for your help!
The text was updated successfully, but these errors were encountered: