Difference between results from inference and the paper #10

unoShin · 2021-08-25T08:38:15Z

First, thanks for your great work.

I trained the model using script 'python train.py --gpu 0-1 --backbone LPSKI' with Human3.6M and MPII datasets.
The protocol is 1 and train epoch was 25.

And I tested the model with test.py and the result is like below :
Protocol 1 error (PA MPJPE) >> tot: 42.72
Directions: 37.63 Discussion: 39.01 Eating: 45.51 Greeting: 43.06 Phoning: 41.33 Posing: 41.10 Purchases: 35.78 Sitting: 43.50 SittingDown: 57.36 Smoking: 47.08 Photo: 51.04 Waiting: 38.32 Walking: 30.94 WalkDog: 46.21 WalkTogether: 38.39

I found the average MPJPE of Protocol1 on paper is 35.2 which is different from my result.
Did I miss something to get the right result?? Like other settings in config.py...

Also, my train time was 16 hours with RTX2080 and the train time on paper is 3 days with 2 RTX titans.
So I also wonder what makes time difference between my result and the paper.

SangbumChoi · 2021-08-25T09:06:15Z

Did you checked that the config.py is initially set to extra small model (I reported three different types which are small, large, and extra small)? seems like 16 hours of training time is also seems that you used extra small model. Please let me know if you have further question.

unoShin · 2021-08-25T09:15:29Z

Did you mean embedding_size in config.py to change the type?
It is 2048 and I checked the large model uses 2048 embedding channels on paper.

The other setting in config.py is like below :
class Config:

## dataset
# training set
# 3D: Human36M, MuCo
# 2D: MSCOCO, MPII 
trainset_3d = ['Human36M']
trainset_2d = ['MPII']

# testing set
# Human36M, MuPoTS, MSCOCO
testset = 'Human36M'

## directory
cur_dir = osp.dirname(os.path.abspath(__file__))
root_dir = osp.join(cur_dir, '..')
data_dir = osp.join(root_dir, 'data')
output_dir = osp.join(root_dir, 'output')
model_dir = osp.join(output_dir, 'model_dump')
pretrain_dir = osp.join(output_dir, 'pre_train')
vis_dir = osp.join(output_dir, 'vis')
log_dir = osp.join(output_dir, 'log')
result_dir = osp.join(output_dir, 'result')

## input, output
input_shape = (256, 256) 
output_shape = (input_shape[0]//8, input_shape[1]//8)
width_multiplier = 1.0
depth_dim = 32
bbox_3d_shape = (2000, 2000, 2000) # depth, height, width
pixel_mean = (0.485, 0.456, 0.406)
pixel_std = (0.229, 0.224, 0.225)

## training config
embedding_size = 2048
lr_dec_epoch = [17, 21]
end_epoch = 25
lr = 1e-3
lr_dec_factor = 10
batch_size = 64

## testing config
test_batch_size = 1
flip_test = True
use_gt_info = True

## others
num_thread = 20
gpu_ids = '0'
num_gpus = 1
continue_train = False

SangbumChoi · 2021-08-25T09:23:36Z

no you should try to change depth_dim 32 to 64 and also if there is error shows up then try to manage output_shape also
maybe correct answer should be like

input_shape = (256, 256) 
output_shape = (input_shape[0]//4, input_shape[1]//4)
width_multiplier = 1.0
depth_dim = 64
bbox_3d_shape = (2000, 2000, 2000) # depth, height, width
pixel_mean = (0.485, 0.456, 0.406)
pixel_std = (0.229, 0.224, 0.225)

unoShin · 2021-08-25T10:25:50Z

It actually made an error like this :

File "/home/unolab/Yoonho/Pose/MobileHumanPose/main/model.py", line 67, in forward
loss_coord = torch.abs(coord - target_coord) * target_vis
RuntimeError: The size of tensor a (8) must match the size of tensor b (64) at non-singleton dimension 0

I changed the output shape to solve the error :
output_shape = (input_shape[0]//(8math.sqrt(2)), input_shape[1]//(8math.sqrt(2)))

And it made another error :
File "/home/unolab/Yoonho/Pose/MobileHumanPose/main/model.py", line 29, in soft_argmax
heatmaps = heatmaps.reshape((-1, joint_num, cfg.depth_dim*cfg.output_shape[0]*cfg.output_shape[1]))
TypeError: reshape(): argument 'shape' must be tuple of ints, but found element of type float at pos 3

SangbumChoi · 2021-08-25T11:57:58Z

@unoShin I found that there was slight mis-match of large model so I uploaded in large branch in case of skip concat. Please let me know if this won't work

unoShin · 2021-08-25T11:59:08Z

@SangbumChoi Thank you :)

unoShin · 2021-08-26T06:48:55Z

I checked it works for training and it will take about 50 hours with RTX 2080.
Thank you for your support!

SangbumChoi · 2021-08-26T11:29:25Z

@unoShin Sounds great please close this issue (otherwise I will in shortly) when the score is similiar to the paper :)

unoShin · 2021-08-30T02:11:39Z

Protocol 1 error (PA MPJPE) >> tot: 40.13
Directions: 34.42 Discussion: 35.80 Eating: 44.37 Greeting: 41.69 Phoning: 38.69 Posing: 37.26 Purchases: 34.61 Sitting: 42.46 SittingDown: 54.30 Smoking: 42.58 Photo: 48.37 Waiting: 35.72 Walking: 29.49 WalkDog: 43.04 WalkTogether: 35.15

I trained the model with new branch(large, 25epochs) and got the result like that.
There still is difference between 40.13 and 35.2(on paper) in MPJPE.

config.py :
trainset_3d = ['Human36M']
trainset_2d = ['MPII']

# testing set
# Human36M, MuPoTS, MSCOCO
testset = 'Human36M'

## directory
cur_dir = osp.dirname(os.path.abspath(__file__))
root_dir = osp.join(cur_dir, '..')
data_dir = osp.join(root_dir, 'data')
output_dir = osp.join(root_dir, 'output')
model_dir = osp.join(output_dir, 'model_dump')
pretrain_dir = osp.join(output_dir, 'pre_train')
vis_dir = osp.join(output_dir, 'vis')
log_dir = osp.join(output_dir, 'log')
result_dir = osp.join(output_dir, 'result')

## input, output
input_shape = (256, 256) 
output_shape = (input_shape[0]//4, input_shape[1]//4)
width_multiplier = 1.0
depth_dim = 64
bbox_3d_shape = (2000, 2000, 2000) # depth, height, width
pixel_mean = (0.485, 0.456, 0.406)
pixel_std = (0.229, 0.224, 0.225)

## training config
embedding_size = 2048
lr_dec_epoch = [17, 21]
end_epoch = 25
lr = 1e-3
lr_dec_factor = 10
batch_size = 16

## testing config
test_batch_size = 16
flip_test = True
use_gt_info = True

## others
num_thread = 20
gpu_ids = '0'
num_gpus = 1
continue_train = False

And protocol is 1 and bbox root file is from Subject 11 (trained on subject 1,5,6,7,8,9).
Did I do something wrong to have the wrong result?

SangbumChoi · 2021-08-30T04:42:15Z

Protocol 1 error (PA MPJPE) >> tot: 40.13
Directions: 34.42 Discussion: 35.80 Eating: 44.37 Greeting: 41.69 Phoning: 38.69 Posing: 37.26 Purchases: 34.61 Sitting: 42.46 SittingDown: 54.30 Smoking: 42.58 Photo: 48.37 Waiting: 35.72 Walking: 29.49 WalkDog: 43.04 WalkTogether: 35.15

I trained the model with new branch(large, 25epochs) and got the result like that.
There still is difference between 40.13 and 35.2(on paper) in MPJPE.

config.py :
trainset_3d = ['Human36M']
trainset_2d = ['MPII']
# testing set
# Human36M, MuPoTS, MSCOCO
testset = 'Human36M'

## directory
cur_dir = osp.dirname(os.path.abspath(__file__))
root_dir = osp.join(cur_dir, '..')
data_dir = osp.join(root_dir, 'data')
output_dir = osp.join(root_dir, 'output')
model_dir = osp.join(output_dir, 'model_dump')
pretrain_dir = osp.join(output_dir, 'pre_train')
vis_dir = osp.join(output_dir, 'vis')
log_dir = osp.join(output_dir, 'log')
result_dir = osp.join(output_dir, 'result')

## input, output
input_shape = (256, 256) 
output_shape = (input_shape[0]//4, input_shape[1]//4)
width_multiplier = 1.0
depth_dim = 64
bbox_3d_shape = (2000, 2000, 2000) # depth, height, width
pixel_mean = (0.485, 0.456, 0.406)
pixel_std = (0.229, 0.224, 0.225)

## training config
embedding_size = 2048
lr_dec_epoch = [17, 21]
end_epoch = 25
lr = 1e-3
lr_dec_factor = 10
batch_size = 16

## testing config
test_batch_size = 16
flip_test = True
use_gt_info = True

## others
num_thread = 20
gpu_ids = '0'
num_gpus = 1
continue_train = False
And protocol is 1 and bbox root file is from Subject 11 (trained on subject 1,5,6,7,8,9).
Did I do something wrong to have the wrong result?

Sorry for inconvenience. I was fool that I commit every intermediate progress on github so I just need to find those past commit. I will find you the appropriate large model code for everyone.
The little thing that might concern is that batch_size due to individual gpu circumstances.

Just one thing that you can check right now is that whether if extra-small model scores same in the paper.

I will let you know if I find one.

Thanks

SangbumChoi · 2021-08-30T08:31:29Z

@unoShin Can you try https://github.com/SangbumChoi/MobileHumanPose/blob/70baeafff0d57ab74a72abedb30c12e739da18ec/common/backbone/lpnet_ski_concat.py
commit 70baeaf

unoShin · 2021-08-30T08:37:20Z

@SangbumChoi I will try and let you know. Thank you.

unoShin · 2021-08-30T09:28:48Z

@SangbumChoi Is there only difference in line 130 of lpnet_ski_concat.py?

unoShin · 2021-09-02T00:56:19Z

09-02 09:52:46 Protocol 1 error (PA MPJPE) >> tot: 40.21
Directions: 36.56 Discussion: 37.00 Eating: 42.51 Greeting: 41.39 Phoning: 38.17 Posing: 36.55 Purchases: 36.60 Sitting: 42.26 SittingDown: 55.09 Smoking: 41.85 Photo: 48.03 Waiting: 36.51 Walking: 29.55 WalkDog: 43.62 WalkTogether: 35.50

Using that commit version, the result still has some difference with the result of the paper.

SangbumChoi · 2021-09-02T04:46:57Z

09-02 09:52:46 Protocol 1 error (PA MPJPE) >> tot: 40.21
Directions: 36.56 Discussion: 37.00 Eating: 42.51 Greeting: 41.39 Phoning: 38.17 Posing: 36.55 Purchases: 36.60 Sitting: 42.26 SittingDown: 55.09 Smoking: 41.85 Photo: 48.03 Waiting: 36.51 Walking: 29.55 WalkDog: 43.62 WalkTogether: 35.50

Using that commit version, the result still has some difference with the result of the paper.

@unoShin Hi, I have two question for you

Did you use exactly same commit branch that I told you?
What was your batch size? and 2d dataset for Human3.6M?

if both answer seems reasonable than I will re-train my code to announce. It might take more than one week

unoShin · 2021-09-02T04:58:28Z

@SangbumChoi
Hi,

Yes, I used this version : 70baeaf
So I asked you whether the difference is only line 130 of lpnet_ski_concat.py between large branch and 70baeaf.
Batch size is 8 and 2d dataset is MPII.
And my training time is 2.21 hour/epoch with RTX 2080 and the total number of epochs is 25.

Thanks!

SangbumChoi · 2021-09-02T05:00:51Z

@unoShin I'm little bit concern that your batch size is different from original paper and code but let me re-check and share with you. Again this might takes some time

unoShin · 2021-09-02T05:25:15Z

@SangbumChoi Thanks for your support :)

ggfresh · 2021-09-09T08:30:37Z

Hi bro, I train with Human36 and MPII while get an error 400+. And the vis output result on 2D looks not so bad. I do not build bbox root file and use gt bbox, I want to figure out why I get a wrong error, how do you gene the bbox root file?

SangbumChoi · 2021-09-09T08:55:36Z

@ggfresh it seems like getting error with more than 400+ might be causing with old-branch (see this issue). and also if the image file seems cropped than actually you don't have to build a gt and root bbox. However, you can generate bbox root file by object detection or RootNet (https://github.com/mks0601/3DMPPE_ROOTNET_RELEASE)

ggfresh · 2021-09-09T09:04:12Z

@ggfresh it seems like getting error with more than 400+ might be causing with old-branch (see this issue). and also if the image file seems cropped than actually you don't have to build a gt and root bbox. However, you can generate bbox root file by object detection or RootNet (https://github.com/mks0601/3DMPPE_ROOTNET_RELEASE)

thanks for the reply, which issue?

ggfresh · 2021-09-09T09:22:20Z

@SangbumChoi When I test the epoch 24，the error is big.

I have the same problem, while my train loss is norm I think.

while I have use epochs 24-24

SangbumChoi · 2021-09-09T09:37:12Z

@ggfresh it is very awkward that already current opened issue claims at least 40 MPJPE. Your description has lack of information to debug and find error. As you said training seems normal, and you might want to actually display the jpg file.

ggfresh · 2021-09-09T09:46:20Z

Sorry, after checking, it is found that my own data is inconsistent.

junhee98 · 2023-01-24T21:00:17Z

@unoShin I found that there was slight mis-match of large model so I uploaded in large branch in case of skip concat. Please let me know if this won't work

What was the uploaded large branch in case of skip concat? I couldn't find it on the code.

SangbumChoi added the enhancement New feature or request label Aug 25, 2021

SangbumChoi mentioned this issue Dec 7, 2021

"loss_coord" does not going down #17

Closed

SangbumChoi mentioned this issue Mar 30, 2022

MPJPE too high for protocol 2 #27

Open

SangbumChoi added the bug Something isn't working label May 20, 2022

SangbumChoi mentioned this issue Jan 15, 2023

XS, S, L model settings #37

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Difference between results from inference and the paper #10

Difference between results from inference and the paper #10

unoShin commented Aug 25, 2021

SangbumChoi commented Aug 25, 2021

unoShin commented Aug 25, 2021

SangbumChoi commented Aug 25, 2021

unoShin commented Aug 25, 2021 •

edited

Loading

SangbumChoi commented Aug 25, 2021

unoShin commented Aug 25, 2021

unoShin commented Aug 26, 2021

SangbumChoi commented Aug 26, 2021

unoShin commented Aug 30, 2021

SangbumChoi commented Aug 30, 2021

SangbumChoi commented Aug 30, 2021

unoShin commented Aug 30, 2021

unoShin commented Aug 30, 2021

unoShin commented Sep 2, 2021

SangbumChoi commented Sep 2, 2021

unoShin commented Sep 2, 2021

SangbumChoi commented Sep 2, 2021

unoShin commented Sep 2, 2021

ggfresh commented Sep 9, 2021

SangbumChoi commented Sep 9, 2021

ggfresh commented Sep 9, 2021

ggfresh commented Sep 9, 2021 •

edited

Loading

SangbumChoi commented Sep 9, 2021

ggfresh commented Sep 9, 2021 •

edited

Loading

junhee98 commented Jan 24, 2023 •

edited

Loading

Difference between results from inference and the paper #10

Difference between results from inference and the paper #10

Comments

unoShin commented Aug 25, 2021

SangbumChoi commented Aug 25, 2021

unoShin commented Aug 25, 2021

SangbumChoi commented Aug 25, 2021

unoShin commented Aug 25, 2021 • edited Loading

SangbumChoi commented Aug 25, 2021

unoShin commented Aug 25, 2021

unoShin commented Aug 26, 2021

SangbumChoi commented Aug 26, 2021

unoShin commented Aug 30, 2021

SangbumChoi commented Aug 30, 2021

SangbumChoi commented Aug 30, 2021

unoShin commented Aug 30, 2021

unoShin commented Aug 30, 2021

unoShin commented Sep 2, 2021

SangbumChoi commented Sep 2, 2021

unoShin commented Sep 2, 2021

SangbumChoi commented Sep 2, 2021

unoShin commented Sep 2, 2021

ggfresh commented Sep 9, 2021

SangbumChoi commented Sep 9, 2021

ggfresh commented Sep 9, 2021

ggfresh commented Sep 9, 2021 • edited Loading

SangbumChoi commented Sep 9, 2021

ggfresh commented Sep 9, 2021 • edited Loading

junhee98 commented Jan 24, 2023 • edited Loading

unoShin commented Aug 25, 2021 •

edited

Loading

ggfresh commented Sep 9, 2021 •

edited

Loading

ggfresh commented Sep 9, 2021 •

edited

Loading

junhee98 commented Jan 24, 2023 •

edited

Loading