Deep learning for human action recognition

Lai, Songjiang

Full metadata record

DC Field	Value	Language
dc.contributor	Department of Electronic and Information Engineering	en_US
dc.contributor.advisor	Lam, Kin Man (EIE)	en_US
dc.creator	Lai, Songjiang	-
dc.identifier.uri	https://theses.lib.polyu.edu.hk/handle/200/12139	-
dc.language	English	en_US
dc.publisher	Hong Kong Polytechnic University	en_US
dc.rights	All rights reserved	en_US
dc.title	Deep learning for human action recognition	en_US
dcterms.abstract	With the rapid development and wide popularity of deep learning in recent years, the performance of computer vision tasks has been greatly improved. The two-stream neural network model, applied to video-based action recognition, has now become a hot research topic. Similar to the traditional two-stream convolutional neural network model for action recognition, the inputs to the two branches are the RGB stream and the optical-flow stream, which can be used for human action recognition with promising performance. However, the two-stream model requires high computational complexity because computing optical flow from a video sequence is computationally intensive. Furthermore, the inputs for the two streams are different, i.e., RGB and optical flow. The original two-stream model cannot be trained end to end, which increases the complexity in training the model and limits the performance. In this research, we introduce a representation flow algorithm proposed by AJ al et[1]., based on the TV1-L1 [2] model, which is similar to the optical-flow algorithm. We replace the traditional optical flow branch of egocentric action recognition model proposed by Swathikiran et al. [3] with the representation-flow branch to make it an end-to-end trainable model. This can greatly reduce the computational cost and the prediction runtime of the new model. We apply the new two-stream model for egocentric action recognition. Moreover, we also apply the class attention maps (CAMs) to the RGB stream, so the model can pay more attention to those regions correlated with the activities under consideration. This can significantly improve the recognition accuracy. Then, we apply convLSTM for spatio-temporal encoding on the image features with spatial attention. We train and evaluate the proposed model on three different data sets: GTEA61, EGTEA GAZE+ and HMDB[4]. Experiment results show that our proposed model can achieve the same recognition accuracy as the original egorcnn model with an optical-flow branch on GTEA61 but outperforms it by 0.65% and 0.84% on EGTEA GAZE+ and HMDB, respectively. In terms of speed, experiment results show the average runtime of our proposed model is 0.1881s, 0.1503s, and 0.1459s on the GTEA61, EGTEA GAZE+ and HMDB databases, respectively, while the corresponding runtimes (including the time for extracting optical flow) for the original model are 101.6795s, 25.3799s, and 203.9958s, respectively. Finally, we also conduct ablation studies and discuss the influence of different parameters on the performance of our proposed model, such as the number of layers for representation flow, the different number of blocks for the backbone architecture, etc.	en_US
dcterms.extent	ix, 52 pages : color illustrations	en_US
dcterms.isPartOf	PolyU Electronic Theses	en_US
dcterms.issued	2022	en_US
dcterms.educationalLevel	M.Sc.	en_US
dcterms.educationalLevel	All Master	en_US
dcterms.LCSH	Computer vision -- Mathematical models	en_US
dcterms.LCSH	Human activity recognition	en_US
dcterms.LCSH	Hong Kong Polytechnic University -- Dissertations	en_US
dcterms.accessRights	restricted access	en_US

Files in This Item:

File	Description	Size	Format
6523.pdf	For All Users (off-campus access for PolyU Staff & Students only)	1.21 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/12139