PoseGPT (ChatPose): Chatting about 3D Human Pose

PoseGPT (ChatPose): Chatting about 3D Human Pose

988 Lượt nghe
PoseGPT (ChatPose): Chatting about 3D Human Pose
Human pose estimation models struggle to grasp contextual information in images or video frames. Meanwhile, text-to-pose generation models, having limited training data, cannot effectively generate accurate poses for novel prompts. PoseGPT, a novel multimodal language model, not only comprehends 3D human pose but also processes image and text data. This innovative model excels in speculative pose generation, such as replicating a pose when a person is tired, and reasoning-based pose estimation, such as accurately estimating the pose of an individual wearing eyeglasses. Paper link: https://arxiv.org/pdf/2311.18836.pdf Project page: https://yfeng95.github.io/posegpt/ Table of content: 00:00 Introduction 07:23 Architecture 11:58 LoRA 19:12 Data Construction 22:41 Speculative Pose Generation (SPG) 25:50 Reasoning-based Pose Estimation (RPE) Icon made by Freepik from flaticon.com