Deep learning-based 3D human pose estimation for fashion applications

Peng, Jihua

Full metadata record

DC Field	Value	Language
dc.contributor	School of Fashion and Textiles	en_US
dc.contributor.advisor	Mok, Tracy (SFT)	en_US
dc.creator	Peng, Jihua	-
dc.identifier.uri	https://theses.lib.polyu.edu.hk/handle/200/13946	-
dc.language	English	en_US
dc.publisher	Hong Kong Polytechnic University	en_US
dc.rights	All rights reserved	en_US
dc.title	Deep learning-based 3D human pose estimation for fashion applications	en_US
dcterms.abstract	3D human pose estimation, a foundational task in computer vision, has received significant attention in recent years due to its crucial applications in robotics, healthcare, and sports science. In particular, it is also a very important research topic in the fashion field due to its ability to yield plausible human body regions for cloth parsing. This study aims to address the issues inherent in exiting state-of-the-art (SOTA) methods of 3D pose estimation by proposing three new and efficient models for 3D pose estimation from various inputs, including video sequence and single image inputs. It is also demonstrated in this study, as an application of these proposed methods, 3D poses predicted from video sequence inputs are being applied and retargeted to game and fashion avatars.	en_US
dcterms.abstract	Pose estimation covers both 2D and 3D pose estimation, and the latter are technically more challenging. For 3D pose estimation, most existing methods have converted this challenging task into a local pose estimation problem by partitioning the human body joints into different groups based on the relevant anatomical relationships. Subsequently, the body joint features from various groups are then fused to predict the overall pose of the whole body, which requires a joint feature fusion module. Nevertheless, the joint feature fusion schemes adopted in existing methods involve the learning of extensive parameters and hence are computationally very expensive. Thus, in this study, a novel grouped 3D pose estimation network is first proposed, which involves an optimized feature fusion (OFF) module that not only requires fewer parameters and calculations than existing methods but also is more accurate. Furthermore, this network introduces a motion amplitude information (MAI) method and a camera intrinsic embedding (CIE) module which are designed to provide better global information and 2D-to-3D conversion knowledge thereby improving the overall robustness and accuracy of the method. In contrast to previous methods, the proposed new network can be trained end-to-end in one single stage, and experiment results have demonstrated that this new method outperforms previous state-of-the-art methods on two benchmarks.	en_US
dcterms.abstract	The above first new method for 3D pose estimation is based on convolution neural network (CNN) for grouped feature fusion. In view of the rapid advancement and outstanding performance for transformer-based deep learning models, another novel method, called Kinematics and Trajectory Prior Knowledge-Enhanced Transformer (KTPFormer), is also proposed for 3D pose estimation with video inputs. This network contains two novel prior attention modules called Kinematic Prior Attention (KPA) and Trajectory Prior Attention (TPA). KPA models kinematic relationships in the human body by constructing a topology of kinematics. On the other hand, TPA builds a temporal topology to learn the priori knowledge of joint motion trajectory across frames. In this way, the two prior attention mechanisms can yield Q, K, V vectors with prior knowledge for the vanilla self-attention mechanisms, which helps them to model global dependencies and features more effectively. With a lightweight plug-and-play design, KPA and TPA can be easily integrated with various state-of-the-art models to further improve the performance in a significant margin with only a small increase in the computational overhead.	en_US
dcterms.abstract	For handling single image inputs, the third new network is designed in this study for 3D pose estimation, which effectively combines the graph and attention mechanism. This method can effectively model the topological information of the human body and learns global correlations among different body joints more efficiently.	en_US
dcterms.abstract	Being a demonstration for potential application for these proposed methods, motion retargeting technique is used to transfer the predicted 3D human poses from fashion images/videos to other people, so that different people can perform the same motion, e.g. catwalk, realizing multiplayer motion animation.	en_US
dcterms.extent	xvii, 154 pages : color illustrations	en_US
dcterms.isPartOf	PolyU Electronic Theses	en_US
dcterms.issued	2024	en_US
dcterms.educationalLevel	Ph.D.	en_US
dcterms.educationalLevel	All Doctorate	en_US
dcterms.accessRights	open access	en_US

Files in This Item:

File	Description	Size	Format
8405.pdf	For All Users	5.94 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/13946