MSQNet

Official implementation of "Actor-agnostic Multi-label Action Recognition with Multi-modal Query", accepted at ICCV Workshops 2023.

Authors

Anindya Mondal*, Sauradip Nag*, Joaquin M Prada, Xiatian Zhu, Anjan Dutta*.

[CVF Open Access] [Poster] [ArXiv] [Video]

Leaderboard

Abstract

Existing action recognition methods are typically actor-specific due to the intrinsic topological and apparent differences among the actors. This requires actor-specific pose estimation (e.g., humans vs. animals), leading to cumbersome model design complexity and high maintenance costs. Moreover, they often focus on learning the visual modality alone and single-label classification whilst neglecting other available information sources (e.g., class name text) and the concurrent occurrence of multiple actions. To overcome these limitations, we propose a new approach called 'actor-agnostic multi-modal multi-label action recognition,' which offers a unified solution for various types of actors, including humans and animals. We further formulate a novel Multi-modal Semantic Query Network (MSQNet) model in a transformer-based object detection framework (e.g., DETR), characterized by leveraging visual and textual modalities to represent the action classes better. The elimination of actor-specific model designs is a key advantage, as it removes the need for actor pose estimation altogether. Extensive experiments on five publicly available benchmarks show that our MSQNet consistently outperforms the prior arts of actor-specific alternatives on human and animal single- and multi-label action recognition tasks by up to 50%.

Implementation

Visit this folder for implementation details.

If you find our work useful, please consider citing:


@InProceedings{Mondal_2023_ICCV,
    author    = {Mondal, Anindya and Nag, Sauradip and Prada, Joaquin M and Zhu, Xiatian and Dutta, Anjan},
    title     = {Actor-Agnostic Multi-Label Action Recognition with Multi-Modal Query},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
    month     = {October},
    year      = {2023},
    pages     = {784-794}
}

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
figs		figs
multi-label-action-main		multi-label-action-main
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MSQNet

Authors

Leaderboard

Abstract

Implementation

About

Releases

Packages

Languages

License

mondalanindya/MSQNet

Folders and files

Latest commit

History

Repository files navigation

MSQNet

Authors

Leaderboard

Abstract

Implementation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages