Few-shot Action Recognition with Video Transformer
Published in 2023 IEEE SITIS Conference, 2023
This paper proposes a novel few-shot action recognition framework that integrates the Transformer-based feature backbone into meta-learning. The proposed method includes pre-training the Video Transformer and utilizing metric-based meta-learning with the ProtoNet algorithm. Extensive experiments on benchmark datasets demonstrate that our approach achieves remarkable performance, surpassing baseline models and obtaining competitive results compared to state-of-the-art models. Additionally, we investigate the impact of supervised and self-supervised learning on video representation and evaluate the transferability of the learned representations in cross-domain scenarios. Our approach suggests a promising direction for exploring the combination of meta-learning with Video Transformer in the context of few-shot learning tasks, potentially contributing to the field of action recognition in various domains.
Recommended citation: N. Aikyn, A. Abu, T. Zhaksylyk and N. A. Tu, "Few-shot Action Recognition with Video Transformer," 2023 17th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), Bangkok, Thailand, 2023, pp. 122-129, doi: 10.1109/SITIS61268.2023.00027.
Download Paper | Download Slides
