-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Difference between results from inference and the paper #10
Comments
Did you checked that the config.py is initially set to extra small model (I reported three different types which are small, large, and extra small)? seems like 16 hours of training time is also seems that you used extra small model. Please let me know if you have further question. |
Did you mean embedding_size in config.py to change the type? The other setting in config.py is like below :
|
no you should try to change depth_dim 32 to 64 and also if there is error shows up then try to manage output_shape also
|
It actually made an error like this : File "/home/unolab/Yoonho/Pose/MobileHumanPose/main/model.py", line 67, in forward I changed the output shape to solve the error : And it made another error : |
@unoShin I found that there was slight mis-match of large model so I uploaded in large branch in case of skip concat. Please let me know if this won't work |
@SangbumChoi Thank you :) |
I checked it works for training and it will take about 50 hours with RTX 2080. |
@unoShin Sounds great please close this issue (otherwise I will in shortly) when the score is similiar to the paper :) |
Protocol 1 error (PA MPJPE) >> tot: 40.13 I trained the model with new branch(large, 25epochs) and got the result like that. config.py :
And protocol is 1 and bbox root file is from Subject 11 (trained on subject 1,5,6,7,8,9). |
Sorry for inconvenience. I was fool that I commit every intermediate progress on github so I just need to find those past commit. I will find you the appropriate large model code for everyone. Just one thing that you can check right now is that whether if extra-small model scores same in the paper. I will let you know if I find one. Thanks |
@SangbumChoi I will try and let you know. Thank you. |
@SangbumChoi Is there only difference in line 130 of lpnet_ski_concat.py? |
09-02 09:52:46 Protocol 1 error (PA MPJPE) >> tot: 40.21 Using that commit version, the result still has some difference with the result of the paper. |
@unoShin Hi, I have two question for you
if both answer seems reasonable than I will re-train my code to announce. It might take more than one week |
@SangbumChoi
Thanks! |
@unoShin I'm little bit concern that your batch size is different from original paper and code but let me re-check and share with you. Again this might takes some time |
@SangbumChoi Thanks for your support :) |
Hi bro, I train with Human36 and MPII while get an error 400+. And the vis output result on 2D looks not so bad. I do not build bbox root file and use gt bbox, I want to figure out why I get a wrong error, how do you gene the bbox root file? |
@ggfresh it seems like getting error with more than 400+ might be causing with old-branch (see this issue). and also if the image file seems cropped than actually you don't have to build a gt and root bbox. However, you can generate bbox root file by object detection or RootNet (https://github.com/mks0601/3DMPPE_ROOTNET_RELEASE) |
thanks for the reply, which issue? |
I have the same problem, while my train loss is norm I think. while I have use epochs 24-24 |
@ggfresh it is very awkward that already current opened issue claims at least 40 MPJPE. Your description has lack of information to debug and find error. As you said training seems normal, and you might want to actually display the jpg file. |
Sorry, after checking, it is found that my own data is inconsistent. |
What was the uploaded large branch in case of skip concat? I couldn't find it on the code. |
First, thanks for your great work.
I trained the model using script 'python train.py --gpu 0-1 --backbone LPSKI' with Human3.6M and MPII datasets.
The protocol is 1 and train epoch was 25.
And I tested the model with test.py and the result is like below :
Protocol 1 error (PA MPJPE) >> tot: 42.72
Directions: 37.63 Discussion: 39.01 Eating: 45.51 Greeting: 43.06 Phoning: 41.33 Posing: 41.10 Purchases: 35.78 Sitting: 43.50 SittingDown: 57.36 Smoking: 47.08 Photo: 51.04 Waiting: 38.32 Walking: 30.94 WalkDog: 46.21 WalkTogether: 38.39
I found the average MPJPE of Protocol1 on paper is 35.2 which is different from my result.
Did I miss something to get the right result?? Like other settings in config.py...
Also, my train time was 16 hours with RTX2080 and the train time on paper is 3 days with 2 RTX titans.
So I also wonder what makes time difference between my result and the paper.
The text was updated successfully, but these errors were encountered: