An Indoor Object Dataset for Mobile-based Detection and Recognition Systems for the Visually Impaired

Azad, Shehreen; Sayed, Abdullah Abu; Faiyrooz, Noshin

dc.contributor.author	Azad, Shehreen
dc.contributor.author	Sayed, Abdullah Abu
dc.contributor.author	Faiyrooz, Noshin
dc.date.accessioned	2023-04-28T05:36:34Z
dc.date.available	2023-04-28T05:36:34Z
dc.date.issued	2022-05-30
dc.identifier.citation	[1] R. R. Bourne, S. R. Flaxman, T. Braithwaite, M. V. Cicinelli, A. Das, J. B. Jonas, J. Keeffe, J. H. Kempen, J. Leasher, H. Limburg, et al., “Magnitude, temporal trends, and projections of the global prevalence of blindness and distance and near vision impairment: a systematic review and meta-analysis,” The Lancet Global Health, vol. 5, no. 9, pp. e888–e897, 2017. [2] A. Arora, A. Grover, R. Chugh, and S. S. Reka, “Real time multi object detection for blind using single shot multibox detector,” Wireless Personal Communications, vol. 107, no. 1, pp. 651–661, 2019. [3] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” in European conference on computer vision, pp. 21–37, Springer, 2016. [4] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4510–4520, 2018. [5] M. Everingham, S. Eslami, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes challenge: A retrospective,” International journal of computer vision, vol. 111, no. 1, pp. 98–136, 2015. [6] M. Noman, V. Stankovic, and A. Tawfik, “Portable offline indoor object recognition system for the visually impaired,” Cogent Engineering, vol. 7, no. 1, p. 1823158, 2020. [7] “Tensorflow object detection api.” Accessed on 21 April 2022. [8] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017. [9] “Microsoft azure cloud interface. analyze an image..” Accessed on 23 April 2022. 52 [10] Z. Zhang, “Microsoft kinect sensor and its effect,” IEEE MultiMedia, vol. 19, no. 2, pp. 4–10, 2012. [11] S. Kayukawa, H. Takagi, J. Guerreiro, S. Morishima, and C. Asakawa, “Smartphone-based assistance for blind people to stand in lines,” in Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems, pp. 1–8, 2020. [12] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018. [13] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al., “Imagenet large scale visual recognition challenge,” International journal of computer vision, vol. 115, no. 3, pp. 211–252, 2015. [14] M. A. Rahman and M. S. Sadi, “Iot enabled automated object recognition for the visually impaired,” Computer Methods and Programs in Biomedicine Update, vol. 1, p. 100015, 2021. [15] “Tensorflow lite object detection api for mobile and edge.” Accessed on 23 April 2022. [16] M. Awad, J. El Haddad, E. Khneisser, T. Mahmoud, E. Yaacoub, and M. Malli, “Intelligent eye: A mobile application for assisting blind people,” in 2018 IEEE Middle East and North Africa Communications Conference (MENACOMM), pp. 1–6, IEEE, 2018. [17] “Catchoom: Craftar pro sdk.” Accessed on 23 April 2022. [18] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in European conference on computer vision, pp. 740–755, Springer, 2014. [19] J. Xiao, J. Hays, K. A. Ehinger, A. Oliva, and A. Torralba, “Sun database: Large-scale scene recognition from abbey to zoo,” in 2010 IEEE computer society conference on computer vision and pattern recognition, pp. 3485–3492, IEEE, 2010. [20] A. Quattoni and A. Torralba, “Recognizing indoor scenes,” in 2009 IEEE conference on computer vision and pattern recognition, pp. 413–420, IEEE, 2009. [21] L. Fei-Fei, R. Fergus, and P. Perona, “Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories,” in 2004 conference on computer vision and pattern recognition workshop, pp. 178–178, IEEE, 2004. [22] F. S. Bashiri, E. LaRose, P. Peissig, and A. P. Tafti, “Mcindoor20000: A fully-labeled image dataset to advance indoor objects detection,” Data in brief, vol. 17, pp. 71–75, 2018. 53 [23] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, 2012. [24] B. Adhikari, J. Peltomaki, J. Puura, and H. Huttunen, “Faster bounding box annotation for object detection in indoor scenes,” in 2018 7th European Workshop on Visual Information Processing (EUVIP), pp. 1–6, IEEE, 2018. [25] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” Advances in neural information processing systems, vol. 28, 2015. [26] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in Proceedings of the IEEE international conference on computer vision, pp. 2980– 2988, 2017. [27] M. Afif, R. Ayachi, Y. Said, E. Pissaloux, and M. Atri, “A novel dataset for intelligent indoor object detection systems,” Artificial Intelligence Advances, vol. 1, no. 1, pp. 52–58, 2019. [28] M. Afif, R. Ayachi, E. Pissaloux, Y. Said, and M. Atri, “Indoor objects detection and recognition for an ict mobility assistance of visually impaired people,” Multimedia Tools and Applications, vol. 79, no. 41, pp. 31645–31662, 2020. [29] X. Ding, Y. Luo, Q. Yu, Q. Li, Y. Cheng, R. Munnoch, D. Xue, and G. Cai, “Indoor object recognition using pre-trained convolutional neural network,” in 2017 23rd International Conference on Automation and Computing (ICAC), pp. 1–6, IEEE, 2017. [30] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014. [31] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9, 2015. [32] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016. [33] “Imagenet large scale visual recognition challenge 2017 (ilsvrc2017).” Accessed on 22 April 2022. 54 [34] J. Dai, Y. Li, K. He, and J. Sun, “R-fcn: Object detection via region-based fully convolutional networks,” Advances in neural information processing systems, vol. 29, 2016. [35] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788, 2016. [36] J. Redmon and A. Farhadi, “Yolo9000: better, faster, stronger,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7263–7271, 2017. [37] J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi, I. Fischer, Z. Wojna, Y. Song, S. Guadarrama, et al., “Speed/accuracy trade-offs for modern convolutional object detectors,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7310–7311, 2017. [38] “Tensorflow 2 detection model zoo.” Accessed on 23 April 2022. [39] T. Berg and A. Berg, “Finding iconic images. in, cvpr, workshops,” 2009. [40] A. Torralba and A. A. Efros, “Unbiased look at dataset bias,” in CVPR 2011, pp. 1521–1528, IEEE, 2011. [41] J. Dobies, “Google image downloader.” Accessed on 21 April 2022. [42] D. L., “Labelimg.” Accessed on 21 April 2022. [43] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc) challenge,” International journal of computer vision, vol. 88, no. 2, pp. 303–338, 2010. [44] “Mask dataset.” [45] M. Tan, R. Pang, and Q. V. Le, “Efficientdet: Scalable and efficient object detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10781–10790, 2020. [46] M. Everingham and J. Winn, “The pascal visual object classes challenge 2007 (voc2007) development kit,” 2009. [47] “Intersection over union (iou).” Accessed on 22 April 2022. [48] “Ms coco challenge evaluation document.” Accessed on 22 April 2022.	en_US
dc.identifier.uri	http://hdl.handle.net/123456789/1862
dc.description	Supervised by Dr. Md Kamrul Hasan, Professor, Department of Computer Science and Engineering(CSE), Islamic University of Technology (IUT) Board Bazar, Gazipur-1704, Bangladesh. This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2022.	en_US
dc.description.abstract	Indoor object detection is a challenging area of computer vision where comparatively lesser work has been done compared to its outdoor counterpart. Surely, such a task requires huge amount of training data to make any classifier detect indoor objects with high precision. This indoor object detection can become way more challenging when it has to be specifically tailored for visually impaired people’s mobility and interaction with everyday use objects. This report presents a novel indoor object dataset containing 5196 unique images of 8 everyday use indoor object category relevant to daily interaction of visually impaired people. The uniqueness of this dataset compared to existing indoor objects dataset is this dataset deals with everyday use objects and presents them with more contextual information than that is available in existing literature. Moreover, the varying lighting condition, non-canonical viewpoints, occlusion and complex background makes the dataset more robust while being trained on any object detection algorithm. Instead of going for higher accuracy we aim to find a trade-off between accuracy and speed as if this dataset is to be used to build a system for visually impaired people’s navigation needs, that system has to be deployed on mobile or sensor-based hand-held device which requires lightweight models. Hence our proposed dataset is tested on two light-weight model, namely, SSD MobileNet V2 FPNLite and EfficientDet D0. It has achieved a mean average precision (mAP) of 29.5 and 39.4 respectively on both the models which is better than the original mAP values achieved by these models. Our proposed dataset can be extended with other indoor object detection dataset, as well as it can be used to build a system for visually impaired people’s navigation.	en_US
dc.language.iso	en	en_US
dc.publisher	Department of Computer Science and Engineering(CSE), Islamic University of Technology(IUT), Board Bazar, Gazipur, Bangladesh	en_US
dc.subject	indoor object dataset, object detection, light-weight model, visually impaired, mobile-based system	en_US
dc.title	An Indoor Object Dataset for Mobile-based Detection and Recognition Systems for the Visually Impaired	en_US
dc.type	Thesis	en_US