Full metadata record
DC FieldValueLanguage
dc.contributorDepartment of Electronic and Information Engineeringen_US
dc.contributor.advisorLam, Kin Man (EIE)en_US
dc.creatorLai, Songjiang-
dc.identifier.urihttps://theses.lib.polyu.edu.hk/handle/200/12139-
dc.languageEnglishen_US
dc.publisherHong Kong Polytechnic Universityen_US
dc.rightsAll rights reserveden_US
dc.titleDeep learning for human action recognitionen_US
dcterms.abstractWith the rapid development and wide popularity of deep learning in recent years, the performance of computer vision tasks has been greatly improved. The two-stream neural network model, applied to video-based action recognition, has now become a hot research topic. Similar to the traditional two-stream convolutional neural network model for action recognition, the inputs to the two branches are the RGB stream and the optical-flow stream, which can be used for human action recognition with promising performance. However, the two-stream model requires high computational complexity because computing optical flow from a video sequence is computationally intensive. Furthermore, the inputs for the two streams are different, i.e., RGB and optical flow. The original two-stream model cannot be trained end to end, which increases the complexity in training the model and limits the performance. In this research, we introduce a representation flow algorithm proposed by AJ al et[1]., based on the TV1-L1 [2] model, which is similar to the optical-flow algorithm. We replace the traditional optical flow branch of egocentric action recognition model proposed by Swathikiran et al. [3] with the representation-flow branch to make it an end-to-end trainable model. This can greatly reduce the computational cost and the prediction runtime of the new model. We apply the new two-stream model for egocentric action recognition. Moreover, we also apply the class attention maps (CAMs) to the RGB stream, so the model can pay more attention to those regions correlated with the activities under consideration. This can significantly improve the recognition accuracy. Then, we apply convLSTM for spatio-temporal encoding on the image features with spatial attention. We train and evaluate the proposed model on three different data sets: GTEA61, EGTEA GAZE+ and HMDB[4]. Experiment results show that our proposed model can achieve the same recognition accuracy as the original egorcnn model with an optical-flow branch on GTEA61 but outperforms it by 0.65% and 0.84% on EGTEA GAZE+ and HMDB, respectively. In terms of speed, experiment results show the average runtime of our proposed model is 0.1881s, 0.1503s, and 0.1459s on the GTEA61, EGTEA GAZE+ and HMDB databases, respectively, while the corresponding runtimes (including the time for extracting optical flow) for the original model are 101.6795s, 25.3799s, and 203.9958s, respectively. Finally, we also conduct ablation studies and discuss the influence of different parameters on the performance of our proposed model, such as the number of layers for representation flow, the different number of blocks for the backbone architecture, etc.en_US
dcterms.extentix, 52 pages : color illustrationsen_US
dcterms.isPartOfPolyU Electronic Thesesen_US
dcterms.issued2022en_US
dcterms.educationalLevelM.Sc.en_US
dcterms.educationalLevelAll Masteren_US
dcterms.LCSHComputer vision -- Mathematical modelsen_US
dcterms.LCSHHuman activity recognitionen_US
dcterms.LCSHHong Kong Polytechnic University -- Dissertationsen_US
dcterms.accessRightsrestricted accessen_US

Files in This Item:
File Description SizeFormat 
6523.pdfFor All Users (off-campus access for PolyU Staff & Students only)1.21 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show simple item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/12139