Assessment of Human Actions from Videos Using Deep Residual Neural Networks

Show simple item record

dc.contributor.author Farabi, MD Shafkat Rahman
dc.contributor.author Himel, S.M. Hadibul Haque
dc.contributor.author Gazzali, Md. Fakhruddin
dc.date.accessioned 2022-04-16T13:24:45Z
dc.date.available 2022-04-16T13:24:45Z
dc.date.issued 2021-03-30
dc.identifier.uri http://hdl.handle.net/123456789/1321
dc.description Supervised by Prof. Dr. Md. Hasanul Kabir, Department of Computer Science and Engineering(CSE), Islamic University of Technology(IUT), Board Bazar, Gazipur-1704, Bangladesh en_US
dc.description.abstract "Action quality assessment (AQA) aims at automatically judging human action based on a video of the said action and assigning a performance score to it. Judging the quality of human actions from videos holds huge promise for the future of computer vision. This thesis work focuses on discovering an improved method for action quality assessment. The majority of works in the existing literature on AQA transform RGB videos to higher-level representations using a Convolutional 3D (C3D) network. These higher-level representa- tions are used to perform action quality assessments. Due to the relatively shallow nature of C3D, the quality of extracted features is lower than what could be extracted using a deeper convolutional neural network. Hence, we experiment with deeper convolutional neural net- works with residual connections (ResNets) for learning representations for action quality assessment. We assess the effects of the depth and the input clip size of the convolutional neural network on the quality of action score predictions. We also look at the effect of using (2+1)D convolutions instead of 3D convolutions for feature extraction. We think that the current clip-level feature representation aggregation technique of averaging is insufficient to capture the relative importance of features. To overcome this, we propose a learning-based weighted-averaging technique that can perform better. We achieve a new state-of-the-art Spearman’s rank correlation of 0.9315 (An improvement of 0.45% over the previous state- of-the-art) on the MTL-AQA dataset using a 34 layer (2+1)D convolutional neural network with the capability of processing 32 frame clips, using our proposed aggregation technique" en_US
dc.language.iso en en_US
dc.publisher Department of Computer Science and Engineering(CSE), Islamic University of Technology(IUT), Board Bazar, Gazipur, Bangladesh en_US
dc.subject AQA, Computer Vision, Deep Learning en_US
dc.title Assessment of Human Actions from Videos Using Deep Residual Neural Networks en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search IUT Repository


Advanced Search

Browse

My Account

Statistics