Video summarization using global and segmented local attention

Show simple item record

dc.contributor.author Azad, Zadid Bin
dc.contributor.author Azmaeen;, Wasif
dc.contributor.author Fayad, Azad Al
dc.date.accessioned 2023-03-15T06:20:53Z
dc.date.available 2023-03-15T06:20:53Z
dc.date.issued 2022-05-30
dc.identifier.citation [1] K. Zhang, W.-L. Chao, F. Sha, and K. Grauman, “Video summarization with long short-term memory,” in Computer Vision – ECCV 2016 (B. Leibe, J. Matas, N. Sebe, and M. Welling, eds.), (Cham), pp. 766– 782, Springer International Publishing, 2016. [2] M. Rochan, L. Ye, and Y. Wang, “Video summarization using fully convolutional sequence networks,” in Proceedings of the European Conference on Computer Vision (ECCV), pp. 347–363, 2018. [3] M. Gygli, H. Grabner, H. Riemenschneider, and L. Van Gool, “Creating summaries from user videos,” in Computer Vision – ECCV 2014 (D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, eds.), (Cham), pp. 505–520, Springer International Publishing, 2014. [4] Y. Song, J. Vallmitjana, A. Stent, and A. Jaimes, “Tvsum: Summarizing web videos using titles,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015. [5] W. Zhu, J. Lu, J. Li, and J. Zhou, “Dsnet: A flexible detect-tosummarize network for video summarization,” IEEE Transactions on Image Processing, vol. 30, pp. 948–962, 2021. [6] J. Fajtl, H. S. Sokeh, V. Argyriou, D. Monekosso, and P. Remagnino, “Summarizing videos with attention,” in Computer Vision – ACCV 2018 Workshops (G. Carneiro and S. You, eds.), (Cham), pp. 39–54, Springer International Publishing, 2019. [7] Z. Ji, K. Xiong, Y. Pang, and X. Li, “Video summarization with attention-based encoder-decoder networks,” IEEE Transactions on 44 BIBLIOGRAPHY Circuits and Systems for Video Technology, vol. 30, no. 6, pp. 1709– 1717, 2020. [8] E. Apostolidis, G. Balaouras, V. Mezaris, and I. Patras, “Combining global and local attention with positional encoding for video summarization,” in 2021 IEEE International Symposium on Multimedia (ISM), pp. 226–234, 2021. [9] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in neural information processing systems, pp. 5998–6008, 2017. [10] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9, 2015. [11] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014. [12] J. Fajtl, H. S. Sokeh, V. Argyriou, D. Monekosso, and P. Remagnino, “Summarizing videos with attention,” in Asian Conference on Computer Vision, pp. 39–54, Springer, 2018. [13] J. A. Ghauri, S. Hakimov, and R. Ewerth, “Supervised video summarization via multiple feature sets with parallel attention,” in 2021 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6s, IEEE, 2021. [14] P. Shaw, J. Uszkoreit, and A. Vaswani, “Self-attention with relative position representations,” 2018. [15] M. Elfeki and A. Borji, “Video summarization via actionness ranking,” in 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 754–763, IEEE, 2019. 45 BIBLIOGRAPHY [16] L. Lebron Casas and E. Koblents, “Video summarization with lstm and deep attention models,” in International Conference on MultiMedia Modeling, pp. 67–79, Springer, 2019. [17] B. Zhao, X. Li, and X. Lu, “Hierarchical recurrent neural network for video summarization,” in Proceedings of the 25th ACM international conference on Multimedia, pp. 863–871, 2017. [18] C. Huang and H. Wang, “A novel key-frames selection framework for comprehensive video summarization,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 2, pp. 577– 589, 2019. [19] Y. Yuan, H. Li, and Q. Wang, “Spatiotemporal modeling for video summarization using convolutional recurrent neural network,” IEEE Access, vol. 7, pp. 64676–64685, 2019. [20] P. Li, Q. Ye, L. Zhang, L. Yuan, X. Xu, and L. Shao, “Exploring global diverse attention via pairwise temporal relation for video summarization,” Pattern Recognition, vol. 111, p. 107677, 2021. [21] W.-T. Chu and Y.-H. Liu, “Spatiotemporal modeling and label distribution learning for video summarization,” in 2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP), pp. 1–6, IEEE, 2019. [22] Y.-T. Liu, Y.-J. Li, F.-E. Yang, S.-F. Chen, and Y.-C. F. Wang, “Learning hierarchical self-attention for video summarization,” in 2019 IEEE international conference on image processing (ICIP), pp. 3377–3381, IEEE, 2019. [23] J. Wang, W. Wang, Z. Wang, L. Wang, D. Feng, and T. Tan, “Stacked memory network for video summarization,” in Proceedings of the 27th ACM International Conference on Multimedia, pp. 836–844, 2019. en_US
dc.identifier.uri http://hdl.handle.net/123456789/1766
dc.description Supervised by Dr. Md. Hasanul Kabir, Professor, Department of Computer Science and Engineering(CSE), Islamic University of Technology (IUT) Board Bazar, Gazipur-1704, Bangladesh. This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2022. en_US
dc.description.abstract Over the recent past years the amount of video data has grown exponentially. Video summarization has emerged as a process that can facilitate in areas like efficient storage, indexing and quick understanding of a large video. We take video summarization as a task of finding out visual cues from frames which lead to a sensible human understandable temporal order. As attention models are currently performing best for maintaining long range temporal orders, our research tends to find a better way to implement attention mechanism for the purpose of video summarization. We approach to solve the problem of video summarization with supervised method. For that, we propose a novel architecture using Global and Segmented Local Multi Head Attention mechanism and this has greatly helped us to maintain the temporal and contextual consistency in the summarized video. From our architecture, we get the insight that segment size should be determined based on the change points of videos inside a dataset and the number of heads in multi-head attention should be determined based on segment length. Our proposed methodology shows us superiority in results with respect to the existing state of the art methods and has achieved remarkable improvements from 2% to 3% on two benchmark data sets. en_US
dc.language.iso en en_US
dc.publisher Department of Computer Science and Engineering(CSE), Islamic University of Technology(IUT), Board Bazar, Gazipur, Bangladesh en_US
dc.subject Video summarization, Attention, Deep Learning en_US
dc.title Video summarization using global and segmented local attention en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search IUT Repository


Advanced Search

Browse

My Account

Statistics