Understanding human fashion images : a parse-pose based study

Zhou, Yanghong

Full metadata record

DC Field	Value	Language
dc.contributor	Institute of Textiles and Clothing	en_US
dc.contributor.advisor	Mok, Pik-yin Tracy (ITC)	-
dc.creator	Zhou, Yanghong	-
dc.identifier.uri	https://theses.lib.polyu.edu.hk/handle/200/10022	-
dc.language	English	en_US
dc.publisher	Hong Kong Polytechnic University	-
dc.rights	All rights reserved	en_US
dc.title	Understanding human fashion images : a parse-pose based study	en_US
dcterms.abstract	There is an ever-increasing amount of fashion image data available on the Internet nowadays, and the rate of growth itself is also increasing. It is necessary to have some ideas about the image contents before we can effectively manage and make use of such a large amount of image assets. In this study, we research to develop effective computer systems for understanding fashion images. This means to cross the so-called semantic gap between the pixel level information stored in the image files and the human understanding of the same images. The traditional approach to image understanding involves a sequence of processing steps. The overall effectiveness of these methods, therefore, relies on the performance of individual processes, and they are not fully end-to-end solutions over raw image pixels. To understand fashion images, a number of challenges have to address. Firstly, fashion products often have large variations in style, texture, and cutting. Secondly, the clothing classifications reported in the literature are too brief, thus the value of the classified image contents are limited. Thirdly, clothing items are frequently subject to deformations and occlusions. Earlier works on clothing recognition mostly relied on handcrafted features, and therefore the performance of these methods was limited by the expressive power of these features. We research on an overall platform for fashion image understanding in this research. It aims to understand the detailed and high-level information in the images, including segmenting the regions of interests, extracting size and shape information of human presented in the images, recognising the fashion items and further recognising the fine-grained attributes of fashion items. The human parsing is the basic block of the proposed framework. It aims to segment a human photo into semantic fashion/body items, such as the face, arms, legs, dress, and background. By reviewing the state-of-the-arts human parsing research, an attention-based human parsing approach is proposed, which is first realised in a cascade network model and later in an end-to-end network model. As the basic block of the proposed platform, human parsing solves the problem of cross-domain clothing retrieval and enables the implementation of clothing recognition and human shape modelling. Human parsing and pose estimation are highly correlated and complementary to each other. We therefore propose to fine-tune the regions of interests segmented by human parsing using pose estimation. The segmented semantic regions are input for human and fashion information understanding. In terms of human information, we mainly extract the size and shape information of the human subjects in the input images. We use 3D modelling customisation technology to address this problem. This is because the segmented regions of human body parts can separate humans from clutter backgrounds and enable extraction of accurate 2D contours of the human subjects on the input images. The extracted contours are employed to reconstruct 3D human shape model, from which body sizes and shape parameters are calculated. In addition to understanding human subjects' information, we investigate the understanding of fashion information from images. To do so, we first develop a new dataset and taxonomy of fashion products, based on the industrial needs on fashion understanding. We next develop deep neural network models to recognise clothing category, fine-grained features and attributes from fashion photos. In the proposed framework, human parsing, pose estimation and clothing recognition are based on deep learning techniques.	en_US
dcterms.extent	xxviii, 257 pages : color illustrations	en_US
dcterms.isPartOf	PolyU Electronic Theses	en_US
dcterms.issued	2019	en_US
dcterms.educationalLevel	Ph.D.	en_US
dcterms.educationalLevel	All Doctorate	en_US
dcterms.LCSH	Hong Kong Polytechnic University -- Dissertations	en_US
dcterms.LCSH	Textile industry -- Technological innovations	en_US
dcterms.LCSH	Image processing -- Digital techniques	en_US
dcterms.accessRights	open access	en_US

Files in This Item:

File	Description	Size	Format
991022232428103411.pdf	For All Users	9.62 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/10022