This is the official code of paper "A Fusion Model based on CNN-Vision Transformer for HumanPose Estimation" (KSC 2022)
This repo contains a PyTorch implementation of 2D Bottom-up Human Pose Estimation model and is developed by Sehee Kim, and Junhee Lee. We refer to the original code: Higher-HRNet & Davit
Method | Backbone | Input size | #Params | GFLOPs | AP | Ap .5 | AP .75 | AP (M) | AP (L) |
---|---|---|---|---|---|---|---|---|---|
HigherHRNet | HRNet-w32 | 512 | 28.6M | 47.9 | 67.1 | 86.2 | 73.0 | 61.5 | 76.1 |
Ours | HRNet-w32 | 512 | 35.5M | - | 67.5 | 86.9 | 73.6 | 61.9 | 75.9 |
Method | Backbone | Input size | #Params | GFLOPs | AP | Ap .5 | AP .75 | AP (M) | AP (L) |
---|---|---|---|---|---|---|---|---|---|
OpenPose* | - | - | - | - | 61.8 | 84.9 | 67.5 | 57.1 | 68.2 |
Hourglass | Hourglass | 512 | 277.8M | 206.9 | 56.6 | 81.8 | 61.8 | 49.8 | 67.0 |
PersonLab | ResNet-152 | 1401 | 68.7M | 405.5 | 66.5 | 88.0 | 72.6 | 62.4 | 72.3 |
PifPaf | - | - | - | - | 66.7 | - | - | 62.4 | 72.9 |
Bottom-up HRNet | HRNet-w32 | 512 | 28.5M | 38.9 | 64.1 | 86.3 | 70.4 | 57.4 | 73.9 |
HigherHRNet | HRNet-w32 | 512 | 28.6M | 47.9 | 66.4 | 87.5 | 72.8 | 61.2 | 74.2 |
Ours | HRNet-w32 | 512 | 35.5M | - | 66.8 | 88.2 | 73.6 | 61.6 | 74.2 |
The code is developed using python 3.8 on Ubuntu. NVIDIA GPUs are needed. The code is developed and tested using 4 NVIDIA 3090 GPU cards. Other platforms or GPU cards are not fully tested.
-
Install pytorch >= v1.1.0 following official instruction.
- Tested with pytorch v1.4.0
-
Clone this repo, and we'll call the directory that you cloned as ${POSE_ROOT}.
-
Install dependencies:
pip install -r requirements.txt
-
Install COCOAPI:
# COCOAPI=/path/to/clone/cocoapi git clone https://github.com/cocodataset/cocoapi.git $COCOAPI cd $COCOAPI/PythonAPI # Install into global site-packages make install # Alternatively, if you do not have permissions or prefer # not to install the COCO API into global site-packages python3 setup.py install --user
Note that instructions like # COCOAPI=/path/to/install/cocoapi indicate that you should pick a path where you'd like to have the software cloned and then set an environment variable (COCOAPI in this case) accordingly.
-
Install CrowdPoseAPI exactly the same as COCOAPI.
- There is a bug in the CrowdPoseAPI, please reverse https://github.com/Jeff-sjtu/CrowdPose/commit/785e70d269a554b2ba29daf137354103221f479e
-
Init output(training model output directory) and log(tensorboard log directory) directory:
mkdir output mkdir log
Your directory tree should look like this:
${POSE_ROOT} ├── data ├── experiments ├── lib ├── log ├── models ├── output ├── tools ├── README.md └── requirements.txt
-
Download pretrained models from our model (GoogleDrive)
${POSE_ROOT} `-- models `-- pytorch `-- pose_coco `-- model_best.pth.tar
For COCO data, please download from COCO download, 2017 Train/Val is needed for COCO keypoints training and validation. Download and extract them under {POSE_ROOT}/data, and make them look like this:
${POSE_ROOT}
|-- data
`-- |-- coco
`-- |-- annotations
| |-- person_keypoints_train2017.json
| `-- person_keypoints_val2017.json
`-- images
|-- train2017
| |-- 000000000009.jpg
| |-- 000000000025.jpg
| |-- 000000000030.jpg
| |-- ...
`-- val2017
|-- 000000000139.jpg
|-- 000000000285.jpg
|-- 000000000632.jpg
|-- ...
Testing on COCO val2017 dataset using pretrained models (GoogleDrive)
For single-scale testing:
python tools/valid.py \
--cfg experiments/coco/higher_hrnet/w32_512_adam_lr1e-3.yaml \
TEST.MODEL_FILE models/pytorch/pose_coco/model_best.pth.tar
By default, we use horizontal flip. To test without flip:
python tools/valid.py \
--cfg experiments/coco/higher_hrnet/w32_512_adam_lr1e-3.yaml \
TEST.MODEL_FILE models/pytorch/pose_coco/model_best.pth.tar \
TEST.FLIP_TEST False
python tools/dist_train.py \
--cfg experiments/coco/higher_hrnet/w32_512_adam_lr1e-3.yaml
By default, it will use all available GPUs on the machine for training. To specify GPUs, use
CUDA_VISIBLE_DEVICES=0,1 python tools/dist_train.py \
--cfg experiments/coco/higher_hrnet/w32_512_adam_lr1e-3.yaml
Due to large input size for bottom-up methods, we use mixed-precision training to train our Higher-HRNet by using the following command:
python tools/dist_train.py \
--cfg experiments/coco/higher_hrnet/w32_512_adam_lr1e-3.yaml \
FP16.ENABLED True FP16.DYNAMIC_LOSS_SCALE True
If you have limited GPU memory, please try to reduce batch size and use SyncBN to train our Higher-HRNet by using the following command:
python tools/dist_train.py \
--cfg experiments/coco/higher_hrnet/w32_512_adam_lr1e-3.yaml \
FP16.ENABLED True FP16.DYNAMIC_LOSS_SCALE True \
MODEL.SYNC_BN True
@inproceedings{cheng2020bottom,
title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
author={Bowen Cheng and Bin Xiao and Jingdong Wang and Honghui Shi and Thomas S. Huang and Lei Zhang},
booktitle={CVPR},
year={2020}
}
@inproceedings{SunXLW19,
title={Deep High-Resolution Representation Learning for Human Pose Estimation},
author={Ke Sun and Bin Xiao and Dong Liu and Jingdong Wang},
booktitle={CVPR},
year={2019}
}
@article{wang2019deep,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Wang, Jingdong and Sun, Ke and Cheng, Tianheng and Jiang, Borui and Deng, Chaorui and Zhao, Yang and Liu, Dong and Mu, Yadong and Tan, Mingkui and Wang, Xinggang and Liu, Wenyu and Xiao, Bin},
journal={TPAMI},
year={2019}
}