Real-Time Multiple Object Tracking with Hierarchical Attention

Show simple item record

dc.contributor.author Bashar, Mk
dc.contributor.author Islam, Samia
dc.contributor.author Hussain, Kashifa Kawaakib
dc.date.accessioned 2024-01-18T06:41:23Z
dc.date.available 2024-01-18T06:41:23Z
dc.date.issued 2023-05-30
dc.identifier.citation [1] J. Ferryman and A. Shahrokni, “Pets2009: Dataset and challenge,” in 2009 Twelfth IEEE international workshop on performance evaluation of tracking and surveillance. IEEE, 2009, pp. 1–6. [2] T. Meinhardt, A. Kirillov, L. Leal-Taixe, and C. Feichtenhofer, “Trackformer: Multi-object tracking with transformers,” arXiv preprint arXiv:2101.02702, 2021. [3] P. Dai, R. Weng, W. Choi, C. Zhang, Z. He, and W. Ding, “Learning a proposal classifier for multiple object tracking,” in Proceedings of the IEEE/CVF Confer ence on Computer Vision and Pattern Recognition, 2021, pp. 2443–2452. [4] D. Xing, N. Evangeliou, A. Tsoukalas, and A. Tzes, “Siamese transformer pyra mid networks for real-time uav tracking,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022, pp. 2139–2148. [5] J. Wan, H. Zhang, J. Zhang, Y. Ding, Y. Yang, Y. Li, and X. Li, “Dsrrtracker: Dy namic search region refinement for attention-based siamese multi-object track ing,” arXiv preprint arXiv:2203.10729, 2022. [6] X. Zhou, T. Yin, V. Koltun, and P. Krähenbühl, “Global tracking transformers,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 8771–8780. [7] J.-N. Zaech, A. Liniger, D. Dai, M. Danelljan, and L. Van Gool, “Learnable online graph representations for 3d multi-object tracking,” IEEE Robotics and Automation Letters, 2022. [8] Y. Zhang, C. Wang, X. Wang, W. Zeng, and W. Liu, “Fairmot: On the fairness of detection and re-identification in multiple object tracking,” International Journal of Computer Vision, vol. 129, no. 11, pp. 3069–3087, 2021. [9] Z. Wang, L. Zheng, Y. Liu, Y. Li, and S. Wang, “Towards real-time multi-object tracking,” in European Conference on Computer Vision. Springer, 2020, pp. 107–122. 27 [10] B. Shuai, A. G. Berneshawi, D. Modolo, and J. Tighe, “Multi-object tracking with siamese track-rcnn,” arXiv preprint arXiv:2004.07786, 2020. [11] P. Voigtlaender, M. Krause, A. Osep, J. Luiten, B. B. G. Sekar, A. Geiger, and B. Leibe, “Mots: Multi-object tracking and segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 7942–7951. [12] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2117–2125. [13] X. Zhu, Y. Jia, S. Jian, L. Gu, and Z. Pu, “Vitt: vision transformer tracker,” Sensors, vol. 21, no. 16, p. 5608, 2021. [14] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceed ings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10 012–10 022. [15] J. Bi, Z. Zhu, and Q. Meng, “Transformer in computer vision,” in 2021 IEEE In ternational Conference on Computer Science, Electronic Information Engineer ing and Intelligent Control Technology (CEI). IEEE, 2021, pp. 178–188. [16] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017. [17] P. Sun, J. Cao, Y. Jiang, R. Zhang, E. Xie, Z. Yuan, C. Wang, and P. Luo, “Transtrack: Multiple object tracking with transformer,” arXiv preprint arXiv:2012.15460, 2020. [18] X. Chen, S. M. Iranmanesh, and K.-C. Lien, “Patchtrack: Multiple object track ing using frame patches,” arXiv preprint arXiv:2201.00080, 2022. [19] E. Yu, Z. Li, S. Han, and H. Wang, “Relationtrack: Relation-aware multiple ob ject tracking with decoupled representation,” IEEE Transactions on Multimedia, 2022. [20] Y. Xu, Y. Ban, G. Delorme, C. Gan, D. Rus, and X. Alameda-Pineda, “Transcen ter: Transformers with dense queries for multiple-object tracking,” arXiv preprint arXiv:2103.15145, 2021. [21] P. Blatter, M. Kanakis, M. Danelljan, and L. Van Gool, “Efficient visual tracking with exemplar transformers,” arXiv preprint arXiv:2112.09686, 2021. 28 [22] F. Zeng, B. Dong, T. Wang, X. Zhang, and Y. Wei, “Motr: End-to-end multiple object tracking with transformer,” arXiv preprint arXiv:2105.03247, 2021. [23] Y. Liu, T. Bai, Y. Tian, Y. Wang, J. Wang, X. Wang, and F.-Y. Wang, “Segdq: Seg mentation assisted multi-object tracking with dynamic query-based transform ers,” Neurocomputing, 2022. [24] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolu tional networks,” arXiv preprint arXiv:1609.02907, 2016. [25] H. W. Kuhn, “The hungarian method for the assignment problem,” Naval re search logistics quarterly, vol. 2, no. 1-2, pp. 83–97, 1955. [26] G. Brasó and L. Leal-Taixé, “Learning a neural solver for multiple object track ing,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pat tern Recognition, 2020, pp. 6247–6257. [27] G. Wang, R. Gu, Z. Liu, W. Hu, M. Song, and J.-N. Hwang, “Track without appearance: Learn box and tracklet embedding with local and global motion pat terns for vehicle tracking,” in Proceedings of the IEEE/CVF International Con ference on Computer Vision, 2021, pp. 9876–9886. [28] J. Li, X. Gao, and T. Jiang, “Graph networks for multiple object tracking,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2020, pp. 719–728. [29] J. He, Z. Huang, N. Wang, and Z. Zhang, “Learnable graph matching: Incorpo rating graph partitioning with deep feature learning for multiple object tracking,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 5299–5309. [30] K. G. Quach, P. Nguyen, H. Le, T.-D. Truong, C. N. Duong, M.-T. Tran, and K. Luu, “Dyglip: A dynamic graph model with link prediction for accurate multi camera multiple object tracking,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 13 784–13 793. [31] C. Ma, F. Yang, Y. Li, H. Jia, X. Xie, and W. Gao, “Deep trajectory post processing and position projection for single & multiple camera multiple object tracking,” International Journal of Computer Vision, vol. 129, no. 12, pp. 3255– 3278, 2021. [32] H. Karunasekera, H. Wang, and H. Zhang, “Multiple object tracking with atten tion to appearance, structure, motion and size,” IEEE Access, vol. 7, pp. 104 423– 104 434, 2019. 29 [33] O. Kesa, O. Styles, and V. Sanchez, “Joint learning architecture for multiple ob ject tracking and trajectory forecasting,” arXiv preprint arXiv:2108.10543, 2021. [34] B. Wang, C. Fruhwirth-Reisinger, H. Possegger, H. Bischof, G. Cao, and E. M. Learning, “Drt: Detection refinement for multiple object tracking,” in 32nd British Machine Vision Conference: BMVC 2021. The British Machine Vi sion Association, 2021. [35] S. Sun, N. Akhtar, H. Song, A. Mian, and M. Shah, “Deep affinity network for multiple object tracking,” IEEE transactions on pattern analysis and machine intelligence, vol. 43, no. 1, pp. 104–119, 2019. [36] W. Qin, H. Du, X. Zhang, Z. Ma, X. Ren, and T. Luo, “Joint prediction and asso ciation for deep feature multiple object tracking,” in Journal of Physics: Confer ence Series, vol. 2026. IOP Publishing, 2021, p. 012021. [37] Q. Yin, Q. Hu, H. Liu, F. Zhang, Y. Wang, Z. Lin, W. An, and Y. Guo, “Detecting and tracking small and dense moving objects in satellite videos: A benchmark,” IEEE Transactions on Geoscience and Remote Sensing, 2021. [38] H. Shi, H. Ghahremannezhad, and C. Liu, “Anomalous driving detection for traf fic surveillance video analysis,” in 2021 IEEE International Conference on Imag ing Systems and Techniques (IST). IEEE, 2021, pp. 1–6. [39] R. Sundararaman, C. De Almeida Braga, E. Marchand, and J. Pettre, “Tracking pedestrian heads in dense crowd,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 3865–3875. [40] S. Han, P. Huang, H. Wang, E. Yu, D. Liu, and X. Pan, “Mat: Motion-aware multi-object tracking,” Neurocomputing, 2022. [41] Z. Zou, J. Huang, and P. Luo, “Compensation tracker: Reprocessing lost object for multi-object tracking,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022, pp. 307–317. [42] X. Gao, Z. Shen, and Y. Yang, “Multi-object tracking with siamese-rpn and adap tive matching strategy,” Signal, Image and Video Processing, pp. 1–9, 2022. [43] B. Shuai, A. Berneshawi, X. Li, D. Modolo, and J. Tighe, “Siammot: Siamese multi-object tracking,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 12 372–12 382. [44] A. Bewley, Z. Ge, L. Ott, F. Ramos, and B. Upcroft, “Simple online and realtime tracking,” in 2016 IEEE international conference on image processing (ICIP). IEEE, 2016, pp. 3464–3468. 30 [45] M. Keuper, S. Tang, B. Andres, T. Brox, and B. Schiele, “Motion segmentation & multiple object tracking by correlation co-clustering,” IEEE transactions on pattern analysis and machine intelligence, vol. 42, no. 1, pp. 140–153, 2018. [46] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computa tion, vol. 9, no. 8, pp. 1735–1780, 1997. [47] C. Kim, L. Fuxin, M. Alotaibi, and J. M. Rehg, “Discriminative appearance mod eling with multi-track pooling for real-time multi-object tracking,” in Proceed ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 9553–9562. [48] Q. Wang, Y. Zheng, P. Pan, and Y. Xu, “Multiple object tracking with correlation learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 3876–3886. [49] J. Pang, L. Qiu, X. Li, H. Chen, Q. Li, T. Darrell, and F. Yu, “Quasi-dense sim ilarity learning for multiple object tracking,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 164–173. [50] Y. Song, P. Zhang, W. Huang, Y. Zha, T. You, and Y. Zhang, “Multiple object tracking based on multi-task learning with strip attention,” IET Image Processing, vol. 15, no. 14, pp. 3661–3673, 2021. [51] N. Muller, Y.-S. Wong, N. J. Mitra, A. Dai, and M. Nießner, “Seeing behind objects for 3d multi-object tracking in rgb-d sequences,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 6071–6080. [52] C.-J. Liu and T.-N. Lin, “Det: Depth-enhanced tracker to mitigate severe occlu sion and homogeneous appearance problems for indoor multiple-object tracking,” IEEE Access, 2022. [53] C. Tan, C. Li, D. He, and H. Song, “Towards real-time tracking and counting of seedlings with a one-stage detector and optical flow,” Computers and Electronics in Agriculture, vol. 193, p. 106683, 2022. [54] O. Kesa, O. Styles, and V. Sanchez, “Multiple object tracking and forecasting: Jointly predicting current and future object locations,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022, pp. 560–569. [55] J. Wu, J. Cao, L. Song, Y. Wang, M. Yang, and J. Yuan, “Track to detect and segment: An online multi-object tracker,” in Proceedings of the IEEE/CVF con ference on computer vision and pattern recognition, 2021, pp. 12 352–12 361. 31 [56] Z. Sun, J. Chen, M. Mukherjee, C. Liang, W. Ruan, and Z. Pan, “Online multiple object tracking based on fusing global and partial features,” Neurocomputing, vol. 470, pp. 190–203, 2022. [57] H. Liang, T. Wu, Q. Zhang, and H. Zhou, “Non-maximum suppression performs later in multi-object tracking,” Applied Sciences, vol. 12, no. 7, p. 3334, 2022. [58] J. He, X. Zhong, J. Yuan, M. Tan, S. Zhao, and L. Zhong, “Joint re-detection and re-identification for multi-object tracking,” in International Conference on Multimedia Modeling. Springer, 2022, pp. 364–376. [59] S. Guo, J. Wang, X. Wang, and D. Tao, “Online multiple object tracking with cross-task synergy,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 8136–8145. [60] T. Liang, L. Lan, X. Zhang, and Z. Luo, “A generic mot boosting framework by combining cues from sot, tracklet and re-identification,” Knowledge and Infor mation Systems, vol. 63, no. 8, pp. 2109–2127, 2021. [61] L. Ke, X. Li, M. Danelljan, Y.-W. Tai, C.-K. Tang, and F. Yu, “Prototypical cross attention networks for multiple object tracking and segmentation,” Advances in Neural Information Processing Systems, vol. 34, 2021. [62] H. Fu, J. Guan, F. Jing, C. Wang, and H. Ma, “A real-time multi-vehicle tracking framework in intelligent vehicular networks,” China Communications, vol. 18, no. 6, pp. 89–99, 2021. [63] J. Peng, T. Wang, W. Lin, J. Wang, J. See, S. Wen, and E. Ding, “Tpm: Multiple object tracking with tracklet-plane matching,” Pattern Recognition, vol. 107, p. 107480, 2020. [64] D. M. Nguyen, R. Henschel, B. Rosenhahn, D. Sonntag, and P. Swoboda, “Lmgp: Lifted multicut meets geometry projections for multi-camera multi-object track ing,” arXiv preprint arXiv:2111.11892, 2021. [65] D. Stadler and J. Beyerer, “Improving multiple pedestrian tracking by track man agement and occlusion handling,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 10 958–10 967. [66] E. Yu, Z. Li, and S. Han, “Towards discriminative representation: Multi-view trajectory contrastive learning for online multi-object tracking,” arXiv preprint arXiv:2203.14208, 2022. [67] G. Wang, Y. Wang, R. Gu, W. Hu, and J.-N. Hwang, “Split and connect: A univer sal tracklet booster for multi-object tracking,” IEEE Transactions on Multimedia, 2022. 32 [68] D. M. Nguyen, R. Henschel, B. Rosenhahn, D. Sonntag, and P. Swoboda, “Lmgp: Lifted multicut meets geometry projections for multi-camera multi-object track ing,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pat tern Recognition, 2022, pp. 8866–8875. [69] X. Zhou, V. Koltun, and P. Krähenbühl, “Tracking objects as points,” in European Conference on Computer Vision. Springer, 2020, pp. 474–490. [70] Z. Zheng, X. Yang, Z. Yu, L. Zheng, Y. Yang, and J. Kautz, “Joint discrimina tive and generative learning for person re-identification,” in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 2138–2147. [71] M. Keuper, E. Levinkov, N. Bonneel, G. Lavoué, T. Brox, and B. Andres, “Ef ficient decomposition of image and mesh graphs by lifted multicuts,” in Pro ceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1751–1759. en_US
dc.identifier.uri http://hdl.handle.net/123456789/2060
dc.description Supervised by Prof. Dr. Md. Hasanul Kabir, Co-supervisor, Mr. Md. Bakhtiar Hasan, Assistant Professor, Department of Computer Science and Engineering(CSE), Islamic University of Technology(IUT), Board Bazar, Gazipur-1704, Bangladesh en_US
dc.description.abstract Multiple object tracking (MOT) is a crucial task in computer vision, with applications in fields such as surveillance, robotics, and autonomous systems. Accurate MOT is essential for maintaining situational awareness in complex environments and detecting objects accurately and tracking objects in real-time. In this paper, we present a novel approach for MOT that combines joint detection and embedding (JDE) which offers simultaneous detection and identification of multiple objects with a Swin Transformer for multi-scale feature extraction. The Swin Transformer, a variant of the popular Transformer architecture, is used to extract rich, multi-scale features from the input data in linear time complexity, enabling our method to handle objects of varying sizes and shapes. We added every stage of Swin blocks with prediction heads to get the multi-scale features. Also, we increased the number of Swin blocks at the first stage to accurately detect objects from large receptive fields. We evaluated our approach on a test set defined by our self-defined MIX dataset and achieved an accuracy of 84.9%. While this is a promising result, there is more room for improvement like improving the reidentification part or modifying the mlp layers of Swin blocks. en_US
dc.language.iso en en_US
dc.publisher Department of Computer Science and Engineering(CSE), Islamic University of Technology(IUT), Board Bazar, Gazipur-1704, Bangladesh en_US
dc.title Real-Time Multiple Object Tracking with Hierarchical Attention en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search IUT Repository


Advanced Search

Browse

My Account

Statistics