Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ViTPose is now available in Hugging Face Transformers #157

Open
NielsRogge opened this issue Jan 13, 2025 · 5 comments
Open

ViTPose is now available in Hugging Face Transformers #157

NielsRogge opened this issue Jan 13, 2025 · 5 comments

Comments

@NielsRogge
Copy link

NielsRogge commented Jan 13, 2025

Hi folks!

ViTPose (and ViTPose++) are now available in the Transformers library, enabling easy inference in a few lines of code.

Docs: https://huggingface.co/docs/transformers/v4.48.0/en/model_doc/vitpose
Checkpoints can be found here.
Demo (on both images and video): https://huggingface.co/spaces/hysts/ViTPose-transformers.

Can be relevant for #133 #26 #139 #135 #111

@omkaar718
Copy link

Thank you, @NielsRogge!
Could you please let me know if finetuning is supported for ViTPose++? If yes, it would be helpful if you point me to the instructions to do it. Thank you!

@Ashayan97
Copy link

Ashayan97 commented Feb 16, 2025

Hello, I would like to ask regarding a problem I faced with using VITPose, from hugging face example. I tried to run the example code:

import torch
import requests
import numpy as np

from PIL import Image

from transformers import (
    AutoProcessor,
    RTDetrForObjectDetection,
    VitPoseForPoseEstimation,
)

device = "cuda" if torch.cuda.is_available() else "cpu"

url = "http://images.cocodataset.org/val2017/000000000139.jpg"
image = Image.open(requests.get(url, stream=True).raw)

person_image_processor = AutoProcessor.from_pretrained("PekingU/rtdetr_r50vd_coco_o365")
person_model = RTDetrForObjectDetection.from_pretrained("PekingU/rtdetr_r50vd_coco_o365", device_map=device)

inputs = person_image_processor(images=image, return_tensors="pt").to(device)

with torch.no_grad():
    outputs = person_model(**inputs)

results = person_image_processor.post_process_object_detection(
    outputs, target_sizes=torch.tensor([(image.height, image.width)]), threshold=0.3
)
result = results[0]  # take first image results

person_boxes = result["boxes"][result["labels"] == 0]
person_boxes = person_boxes.cpu().numpy()

person_boxes[:, 2] = person_boxes[:, 2] - person_boxes[:, 0]
person_boxes[:, 3] = person_boxes[:, 3] - person_boxes[:, 1]

image_processor = AutoProcessor.from_pretrained("usyd-community/vitpose-base-simple")
model = VitPoseForPoseEstimation.from_pretrained("usyd-community/vitpose-base-simple", device_map=device)

inputs = image_processor(image, boxes=[person_boxes], return_tensors="pt").to(device)

with torch.no_grad():
    outputs = model(**inputs)

pose_results = image_processor.post_process_pose_estimation(outputs, boxes=[person_boxes])
image_pose_result = pose_results[0]  # results for first image

And I face the following problem:

Traceback (most recent call last): File "/home/shayan/projects/vit_pose/test_hugging_face.py", line 56, in <module> pose_results = image_processor.post_process_pose_estimation(outputs, boxes=[person_boxes]) File "/home/shayan/.local/lib/python3.10/site-packages/transformers/models/vitpose/image_processing_vitpose.py", line 648, in post_process_pose_estimation preds, scores = self.keypoints_from_heatmaps( File "/home/shayan/.local/lib/python3.10/site-packages/transformers/models/vitpose/image_processing_vitpose.py", line 589, in keypoints_from_heatmaps preds = post_dark_unbiased_data_processing(coords, heatmaps, kernel=kernel) File "/home/shayan/.local/lib/python3.10/site-packages/transformers/models/vitpose/image_processing_vitpose.py", line 180, in post_dark_unbiased_data_processing [ File "/home/shayan/.local/lib/python3.10/site-packages/transformers/models/vitpose/image_processing_vitpose.py", line 181, in <listcomp> [gaussian_filter(heatmap, sigma=0.8, radius=(radius, radius), axes=(0, 1)) for heatmap in heatmaps] File "/home/shayan/.local/lib/python3.10/site-packages/transformers/models/vitpose/image_processing_vitpose.py", line 181, in <listcomp> [gaussian_filter(heatmap, sigma=0.8, radius=(radius, radius), axes=(0, 1)) for heatmap in heatmaps] TypeError: gaussian_filter() got an unexpected keyword argument 'radius'
Could you please give me a hint on how I can solve this problem?

@NielsRogge
Copy link
Author

Pinging @qubvel here

@qubvel
Copy link

qubvel commented Feb 17, 2025

Hey @Ashayan97, that's most probably scipy version issue, please try updating it

pip install -U scipy

@Ashayan97
Copy link

Dear @qubvel and @NielsRogge,
Thank you for your help! The provided solution fixed my problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants