ViTPose is now available in Hugging Face Transformers #157

NielsRogge · 2025-01-13T18:33:46Z

Hi folks!

ViTPose (and ViTPose++) are now available in the Transformers library, enabling easy inference in a few lines of code.

Docs: https://huggingface.co/docs/transformers/v4.48.0/en/model_doc/vitpose
Checkpoints can be found here.
Demo (on both images and video): https://huggingface.co/spaces/hysts/ViTPose-transformers.

Can be relevant for #133 #26 #139 #135 #111

omkaar718 · 2025-01-16T23:24:22Z

Thank you, @NielsRogge!
Could you please let me know if finetuning is supported for ViTPose++? If yes, it would be helpful if you point me to the instructions to do it. Thank you!

Ashayan97 · 2025-02-16T21:49:58Z

Hello, I would like to ask regarding a problem I faced with using VITPose, from hugging face example. I tried to run the example code:

import torch
import requests
import numpy as np

from PIL import Image

from transformers import (
    AutoProcessor,
    RTDetrForObjectDetection,
    VitPoseForPoseEstimation,
)

device = "cuda" if torch.cuda.is_available() else "cpu"

url = "http://images.cocodataset.org/val2017/000000000139.jpg"
image = Image.open(requests.get(url, stream=True).raw)

person_image_processor = AutoProcessor.from_pretrained("PekingU/rtdetr_r50vd_coco_o365")
person_model = RTDetrForObjectDetection.from_pretrained("PekingU/rtdetr_r50vd_coco_o365", device_map=device)

inputs = person_image_processor(images=image, return_tensors="pt").to(device)

with torch.no_grad():
    outputs = person_model(**inputs)

results = person_image_processor.post_process_object_detection(
    outputs, target_sizes=torch.tensor([(image.height, image.width)]), threshold=0.3
)
result = results[0]  # take first image results

person_boxes = result["boxes"][result["labels"] == 0]
person_boxes = person_boxes.cpu().numpy()

person_boxes[:, 2] = person_boxes[:, 2] - person_boxes[:, 0]
person_boxes[:, 3] = person_boxes[:, 3] - person_boxes[:, 1]

image_processor = AutoProcessor.from_pretrained("usyd-community/vitpose-base-simple")
model = VitPoseForPoseEstimation.from_pretrained("usyd-community/vitpose-base-simple", device_map=device)

inputs = image_processor(image, boxes=[person_boxes], return_tensors="pt").to(device)

with torch.no_grad():
    outputs = model(**inputs)

pose_results = image_processor.post_process_pose_estimation(outputs, boxes=[person_boxes])
image_pose_result = pose_results[0]  # results for first image

And I face the following problem:

Traceback (most recent call last): File "/home/shayan/projects/vit_pose/test_hugging_face.py", line 56, in <module> pose_results = image_processor.post_process_pose_estimation(outputs, boxes=[person_boxes]) File "/home/shayan/.local/lib/python3.10/site-packages/transformers/models/vitpose/image_processing_vitpose.py", line 648, in post_process_pose_estimation preds, scores = self.keypoints_from_heatmaps( File "/home/shayan/.local/lib/python3.10/site-packages/transformers/models/vitpose/image_processing_vitpose.py", line 589, in keypoints_from_heatmaps preds = post_dark_unbiased_data_processing(coords, heatmaps, kernel=kernel) File "/home/shayan/.local/lib/python3.10/site-packages/transformers/models/vitpose/image_processing_vitpose.py", line 180, in post_dark_unbiased_data_processing [ File "/home/shayan/.local/lib/python3.10/site-packages/transformers/models/vitpose/image_processing_vitpose.py", line 181, in <listcomp> [gaussian_filter(heatmap, sigma=0.8, radius=(radius, radius), axes=(0, 1)) for heatmap in heatmaps] File "/home/shayan/.local/lib/python3.10/site-packages/transformers/models/vitpose/image_processing_vitpose.py", line 181, in <listcomp> [gaussian_filter(heatmap, sigma=0.8, radius=(radius, radius), axes=(0, 1)) for heatmap in heatmaps] TypeError: gaussian_filter() got an unexpected keyword argument 'radius'
Could you please give me a hint on how I can solve this problem?

NielsRogge · 2025-02-17T08:19:08Z

Pinging @qubvel here

qubvel · 2025-02-17T10:10:07Z

Hey @Ashayan97, that's most probably scipy version issue, please try updating it

pip install -U scipy

Ashayan97 · 2025-02-17T15:24:07Z

Dear @qubvel and @NielsRogge,
Thank you for your help! The provided solution fixed my problem.

This was referenced Jan 14, 2025

How to Evaluate Using ViTPose and a Custom Dataset in COCO Format #153

Open

Running video demo #20

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ViTPose is now available in Hugging Face Transformers #157

ViTPose is now available in Hugging Face Transformers #157

NielsRogge commented Jan 13, 2025 •

edited

Loading

omkaar718 commented Jan 16, 2025

Ashayan97 commented Feb 16, 2025 •

edited

Loading

NielsRogge commented Feb 17, 2025

qubvel commented Feb 17, 2025

Ashayan97 commented Feb 17, 2025

ViTPose is now available in Hugging Face Transformers #157

ViTPose is now available in Hugging Face Transformers #157

Comments

NielsRogge commented Jan 13, 2025 • edited Loading

omkaar718 commented Jan 16, 2025

Ashayan97 commented Feb 16, 2025 • edited Loading

NielsRogge commented Feb 17, 2025

qubvel commented Feb 17, 2025

Ashayan97 commented Feb 17, 2025

NielsRogge commented Jan 13, 2025 •

edited

Loading

Ashayan97 commented Feb 16, 2025 •

edited

Loading