BTSQA: An Architecture for Bangla Textual and Spoken Question Answering

Jim, Shams Tanveer; Islam, Md.Ashraful; Abdullah, Adnan

IUT Repository Home
→
Mechanical and Production Engineering (MPE)
→
Thesis
→
Undergraduate
→
2023
→
View Item

dc.contributor.author	Jim, Shams Tanveer
dc.contributor.author	Islam, Md.Ashraful
dc.contributor.author	Abdullah, Adnan
dc.date.accessioned	2024-09-06T05:28:01Z
dc.date.available	2024-09-06T05:28:01Z
dc.date.issued	2023-04-30
dc.identifier.citation	[1] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013. [2] J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation,” in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014. [3] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computa tion, vol. 9, no. 8, pp. 1735–1780, 1997. [4] F. A. Gers, J. Schmidhuber, and F. Cummins, “Learning to forget: Continual prediction with lstm,” Neural computation, 2000. [5] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using rnn encoder-decoder for statistical machine translation,” arXiv preprint arXiv:1406.1078, 2014. [6] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in neural information processing systems, 2017. [7] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018. [8] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning with a unified text-to-text transformer,” arXiv preprint arXiv:1910.10683, 2019. [9] P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang, “SQuAD: 100,000+ questions for machine comprehension of text,” in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Austin, Texas: Association for Computational Linguistics, Nov. 2016, pp. 2383–2392. [Online]. Available: https://aclanthology.org/D16-1264 60 [10] P. Rajpurkar, R. Jia, and P. Liang, “Know what you don’t know: Unanswerable questions for squad,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 2: Short Papers, I. Gurevych and Y. Miyao, Eds. Association for Computational Linguistics, 2018, pp. 784–789. [Online]. Available: https://aclanthology.org/P18-2124/ [11] S. Reddy, D. Chen, and C. Manning, “Coqa: A conversational question answering challenge,” Transactions of the Association for Computational Linguistics, vol. 7, pp. 249–266, 03 2019. [12] J. H. Clark, J. Palomaki, V. Nikolaev, E. Choi, D. Garrette, M. Collins, and T. Kwiatkowski, “Tydi QA: A benchmark for information-seeking question answering in typologically diverse languages,” Trans. Assoc. Comput. Linguistics, vol. 8, pp. 454–470, 2020. [Online]. Available: https: //doi.org/10.1162/tacl_a_00317 [13] D. Khashabi, A. Ng, T. Khot, A. Sabharwal, H. Hajishirzi, and C. Callison-Burch, “Gooaq: Open question answering with diverse answer types,” in Findings of the Association for Computational Linguistics: EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 16-20 November, 2021, M. Moens, X. Huang, L. Specia, and S. W. Yih, Eds. Association for Computational Linguistics, 2021, pp. 421–433. [Online]. Available: https://doi.org/10.18653/v1/2021.findings-emnlp.38 [14] A. Trischler, T. Wang, X. Yuan, J. Harris, A. Sordoni, P. Bachman, and K. Suleman, “Newsqa: A machine comprehension dataset,” in Proceedings of the 2nd Workshop on Representation Learning for NLP, Rep4NLP@ACL 2017, Vancouver, Canada, August 3, 2017, P. Blunsom, A. Bordes, K. Cho, S. B. Cohen, C. Dyer, E. Grefenstette, K. M. Hermann, L. Rimell, J. Weston, and S. Yih, Eds. Association for Computational Linguistics, 2017, pp. 191–200. [Online]. Available: https://doi.org/10.18653/v1/w17-2623 [15] I. Rybin, V. Korablinov, P. Efimov, and P. Braslavski, “Rubq 2.0: An innovated russian question answering dataset,” in The Semantic Web - 18th International Conference, ESWC 2021, Virtual Event, June 6-10, 2021, Proceedings, ser. Lecture Notes in Computer Science, R. Verborgh, K. Hose, H. Paulheim, P. Champin, M. Maleshkova, Ó. Corcho, P. Ristoski, and M. Alam, Eds., vol. 12731. Springer, 2021, pp. 532–547. [Online]. Available: https://doi.org/10.1007/978-3-030-77385-4_32 61 [16] A. Kazemi, J. Mozafari, and M. A. Nematbakhsh, “Persianquad: The native question answering dataset for the persian language,” IEEE Access, vol. 10, pp. 26 045–26 057, 2022. [Online]. Available: https://doi.org/10.1109/ACCESS.2022. 3157289 [17] M. d’Hoffschmidt, W. Belblidia, Q. Heinrich, T. Brendlé, and M. Vidal, “Fquad: French question answering dataset,” in Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020, ser. Findings of ACL, T. Cohn, Y. He, and Y. Liu, Eds., vol. EMNLP 2020. Association for Computational Linguistics, 2020, pp. 1193–1208. [Online]. Available: https://doi.org/10.18653/v1/2020.findings-emnlp.107 [18] T. Möller, J. Risch, and M. Pietsch, “Germanquad and germandpr: Improving non english question answering and passage retrieval,” CoRR, vol. abs/2104.12741, 2021. [Online]. Available: https://arxiv.org/abs/2104.12741 [19] S. Lim, M. Kim, and J. Lee, “Korquad1.0: Korean QA dataset for machine reading comprehension,” CoRR, vol. abs/1909.07005, 2019. [Online]. Available: http://arxiv.org/abs/1909.07005 [20] B. So, K. Byun, K. Kang, and S. Cho, “Jaquad: Japanese question answering dataset for machine reading comprehension,” CoRR, vol. abs/2202.01764, 2022. [Online]. Available: https://arxiv.org/abs/2202.01764 [21] W. S. Ismail and M. N. Homsi, “DAWQAS: A dataset for arabic why question answering system,” in Fourth International Conference On Arabic Computational Linguistics, ACLING 2018, November 17-19, 2018, Dubai, United Arab Emirates, ser. Procedia Computer Science, K. Shaalan and S. R. El-Beltagy, Eds., vol. 142. Elsevier, 2018, pp. 123–131. [Online]. Available: https://doi.org/10.1016/j.procs.2018.10.467 [22] S. Banerjee, S. K. Naskar, and S. Bandyopadhyay, “BFQA: A bengali factoid question answering system,” in Text, Speech and Dialogue - 17th International Conference, TSD 2014, Brno, Czech Republic, September 8-12, 2014. Proceedings, ser. Lecture Notes in Computer Science, P. Sojka, A. Horák, I. Kopecek, and K. Pala, Eds., vol. 8655. Springer, 2014, pp. 217–224. [Online]. Available: https://doi.org/10.1007/978-3-319-10816-2_27 [23] S. Sarker, S. T. Alam Monisha, and M. M. H. Nahid, “Bengali question answering system for factoid questions: A statistical approach,” in 2019 International Confer ence on Bangla Speech and Language Processing (ICBSLP). Sylhet,Bangladesh: IEEE, 2019, pp. 1–5. 62 [24] M. Keya, A. K. M. Masum, B. Majumdar, S. A. Hossain, and S. Abujar, “Bengali question answering system using seq2seq learning based on general knowledge dataset,” in 11th International Conference on Computing, Communication and Networking Technologies, ICCCNT 2020, Kharagpur, India, July 1-3, 2020. IEEE, 2020, pp. 1–6. [Online]. Available: https://doi.org/10. 1109/ICCCNT49239.2020.9225605 [25] M. M. Uddin, N. S. Patwary, M. M. Hasan, T. Rahman, and M. Tanveer, “End-to end neural network for paraphrased question answering architecture with single supporting line in bangla language,” International Journal of Future Computer and Communication, vol. 9, no. 3, 2020. [26] T. T. Mayeesha, A. M. Sarwar, and R. M. Rahman, “Deep learning based question answering system in bengali,” Journal of Information and Telecommunication, vol. 5, no. 2, pp. 145–178, 2021. [Online]. Available: https://doi.org/10.1080/24751839.2020.1833136 [27] A. Saha, M. I. Noor, S. Fahim, S. Sarker, F. Badal, and S. Das, “An approach to extractive bangla question answering based on bert-bangla and bquad,” in 2021 International Conference on Automation, Control and Mechatronics for Industry 4.0 (ACMI). Rajshahi,Bangladesh: IEEE, 7 2021, pp. 1–6. [28] A. Bhattacharjee, T. Hasan, K. Samin, M. S. Rahman, A. Iqbal, and R. Shahriyar, “Banglabert: Combating embedding barrier for low-resource language understand ing,” arXiv preprint arXiv:2101.00204, 2021. [29] P. Rajpurkar, R. Jia, and P. Liang, “Know what you don’t know: Unanswerable questions for SQuAD,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Melbourne, Australia: Association for Computational Linguistics, Jul. 2018, pp. 784–789. [Online]. Available: https://aclanthology.org/P18-2124 [30] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettle moyer, and V. Stoyanov, “Roberta: A robustly optimized bert pretraining approach,” arXiv preprint arXiv:1907.11692, 2019. [31] M. Chen, A. Radford, R. Child, J. Wu, H. Jun, P. Dhariwal, D. Luan, and I. Sutskever, “Generative pretraining from pixels,” 2020. [32] D. Chen, A. Fisch, J. Weston, and A. Bordes, “Reading wikipedia to answer open-domain questions,” in Association for Computational Linguistics (ACL), 2017. 63 [33] N. L. C. Group, “R-net: Machine reading comprehension with self-matching networks,” Microsoft Research, 2017. [Online]. Available: https [34] V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, “Librispeech: An asr corpus based on public domain audio books,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2015, pp. 5206–5210. [35] R. Ardila, M. Branson, K. Davis, M. Kohler, J. Meyer, M. Henretty, R. Morais, L. Saunders, F. M. Tyers, and G. Weber, “Common voice: A massively-multilingual speech corpus,” in Proceedings of The 12th Language Resources and Evaluation Conference, LREC 2020, Marseille, France, May 11-16, 2020, N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, and S. Piperidis, Eds. European Language Resources Association, 2020, pp. 4218–4222. [Online]. Available: https://aclanthology.org/2020.lrec-1.520/ [36] J. Yu, S. Zhang, J. Wu, S. Ghorbani, B. Wu, S. Kang, S. Liu, X. Liu, H. Meng, and D. Yu, “Audio-visual recognition of overlapped speech for the LRS2 dataset,” in 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020. IEEE, 2020, pp. 6984–6988. [Online]. Available: https://doi.org/10.1109/ICASSP40776.2020.9054127 [37] VoxForge, “Voxforge dataset,” 2007. [38] R. Ardila, M. Branson, K. Davis, M. Kohler, J. Meyer, M. Henretty, R. Morais, L. Saunders, F. Tyers, and G. Weber, “Common voice: A massively-multilingual speech corpus,” in Proceedings of the Twelfth Language Resources and Evaluation Conference. Marseille, France: European Language Resources Association, May 2020, pp. 4218–4222. [Online]. Available: https://aclanthology.org/2020.lrec-1.520 [39] W. Chan, N. Jaitly, Q. V. Le, and O. Vinyals, “Listen, attend and spell,” arXiv preprint arXiv:1508.01211, 2015. [40] A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos, E. Elsen, R. Prenger, S. Satheesh, S. Sengupta, A. Coates, and A. Y. Ng, “Deep speech: Scaling up end-to-end speech recognition,” arXiv preprint arXiv:1412.5567, 2014. [41] A. Graves, S. Fernández, and F. Gomez, “Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks,” Proceedings of the 23rd international conference on Machine learning, pp. 369–376, 2006. 64 [42] A. Baevski, Y. Zhou, A. Mohamed, and M. Auli, “wav2vec 2.0: A framework for self-supervised learning of speech representations,” in Advances in Neural Information Processing Systems, vol. 33, 2020. [43] A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, “Robust speech recognition via large-scale weak supervision,” arXiv preprint arXiv:2212.04356, 2022. [44] S. Kriman, S. Beliaev, B. Ginsburg, J. Huang, O. Kuchaiev, V. Lavrukhin, R. Leary, J. Li, and Y. Zhang, “Quartznet: Deep automatic speech recognition with 1d time channel separable convolutions,” 2019. [45] M. A. Walker, A. I. Rudnicky, J. S. Aberdeen, E. O. Bratt, J. S. Garofolo, H. W. Hastie, A. N. Le, B. L. Pellom, A. Potamianos, R. J. Passonneau, R. Prasad, S. Roukos, G. A. Sanders, S. Seneff, and D. Stallard, “DARPA communicator evaluation: progress from 2000 to 2001,” in 7th International Conference on Spoken Language Processing, ICSLP2002 - INTERSPEECH 2002, Denver, Colorado, USA, September 16-20, 2002, J. H. L. Hansen and B. L. Pellom, Eds. ISCA, 2002. [Online]. Available: http://www.isca-speech.org/archive/icslp_2002/i02_0273.html [46] G. Lin, Y. Chuang, H. Chung, S. Yang, H. Chen, S. A. Dong, S. Li, A. Mohamed, H. Lee, and L. Lee, “DUAL: discrete spoken unit adaptive learning for textless spoken question answering,” in Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022, H. Ko and J. H. L. Hansen, Eds. ISCA, 2022, pp. 5165–5169. [Online]. Available: https://doi.org/10.21437/Interspeech.2022-612 [47] Y. Wu, S. Rallabandi, R. Srinivasamurthy, P. P. Dakle, A. Gon, and P. Raghavan, “Heysquad: A spoken question answering dataset,” CoRR, vol. abs/2304.13689, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2304.13689 [48] D. Su and P. Fung, “Improving spoken question answering using contextualized word representation,” in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 8004–8008. [49] A. Bhattacharjee, T. Hasan, W. U. Ahmad, and R. Shahriyar, “Banglanlg: Benchmarks and resources for evaluating low-resource natural language generation in bangla,” CoRR, vol. abs/2205.11081, 2022. [Online]. Available: https://arxiv.org/abs/2205.11081 [50] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, 65 CA, USA, May 7-9, 2015, Conference Track Proceedings, Y. Bengio and Y. LeCun, Eds., 2015. [Online]. Available: http://arxiv.org/abs/1412.6980 [51] L. Xue, N. Constant, A. Roberts, M. Kale, R. Al-Rfou, A. Siddhant, A. Barua, and C. Raffel, “mT5: A massively multilingual pre-trained text-to-text transformer,” in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Online: Association for Computational Linguistics, Jun. 2021, pp. 483–498. [Online]. Available: https://aclanthology.org/2021.naacl-main.41 [52] J. Shieh, L. Popa, and P. B. Godfrey, “Whisper: Tracing the information flow in data center networks,” in 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 15), 2015. [53] D. Dua, Y. Wang, P. Dasigi, G. Stanovsky, S. Singh, and M. Gardner, “DROP: A reading comprehension benchmark requiring discrete reasoning over paragraphs,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, Minnesota: Association for Computational Linguistics, Jun. 2019, pp. 2368–2378. [Online]. Available: https://aclanthology.org/N19-1246 [54] A. Conneau, A. Baevski, R. Collobert, A. Mohamed, and M. Auli, “Unsupervised cross-lingual representation learning for speech recognition,” in Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August - 3 September 2021, H. Hermansky, H. Cernocký, L. Burget, L. Lamel, O. Scharenborg, and P. Motlícek, Eds. ISCA, 2021, pp. 2426–2430. [Online]. Available: https://doi.org/10.21437/Interspeech.2021-329 [55] D. Scott and J. Moore, “An nlg evaluation competition? eight reasons to be cautious,” in Proceedings of the Workshop on Shared Tasks and Comparative Evaluation in Natural Language Generation, 2007, pp. 22–23. [56] C. van der Lee, A. Gatt, E. van Miltenburg, and E. Krahmer, “Human evaluation of automatically generated text: Current trends and best practice guidelines,” Computer Speech Language, vol. 67, p. 101151, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S088523082030084X [57] A. Belz and E. Reiter, “Comparing automatic and human evaluation of nlg systems,” in 11th conference of the european chapter of the association for computational linguistics, 2006, pp. 313–320.	en_US
dc.identifier.uri	http://hdl.handle.net/123456789/2167
dc.description	Supervised by Mr. Md. Mezbaur Rahman, Lecturer, Department of Computer Science and Engineering(CSE), Islamic University of Technology(IUT), Board Bazar, Gazipur-1704, Bangladesh	en_US
dc.description.abstract	Question answering (QA) is a field within natural language processing (NLP) that focuses on developing systems capable of automatically answering questions posed in human language. QA systems aim to understand the meaning and intent behind questions and provide accurate and relevant answers by leveraging large corpora of text data. Short Answer Questioning (SQA) is a specific type of question answering task within natural language processing (NLP) that focuses on generating concise and precise answers to fact-based questions. Unlike traditional QA systems that generate longer, descriptive answers, SQA systems aim to extract short snippets of information directly related to the question. These systems employ techniques such as text com prehension, named entity recognition, and information retrieval to identify the most relevant information and produce brief and accurate responses. SQA finds applications in areas such as search engines, voice assistants, and chatbots, where quick and concise answers are desired. In our thesis, we propose BTSQA, an architecture to perform spoken question answering. We have built the architecture with one general QA model and one ASR model. Then we added a word correction step to improve the performance. Initially the general QA model,T5 transformer model, was used to with F1 score of 73.37%. We used audio dataset on the whisper ASR model with WER of 31.58% and Wev2Vec2 model with WER score 29.64%. When we combined the general QA model with ASR model using word correction the performance F1 score was 53.65%. This models were run on the text dataset we built and they were transferred in audio using GoogleTTS	en_US
dc.language.iso	en	en_US
dc.publisher	Department of Computer Science and Engineering(CSE), Islamic University of Technology(IUT), Board Bazar, Gazipur-1704, Bangladesh	en_US
dc.subject	Question Answering, Spoken Question Answering, ASR, Word Correc tion, Transformer Model	en_US
dc.title	BTSQA: An Architecture for Bangla Textual and Spoken Question Answering	en_US
dc.type	Thesis	en_US