A Diverse and Explainable Multi-hop QA Dataset for Bengali Language

Show simple item record

dc.contributor.author Intiser, Md. Aseer
dc.contributor.author Islam, Mohammad Munimul
dc.contributor.author Salehin, Md. Reyanus
dc.date.accessioned 2024-09-05T10:05:45Z
dc.date.available 2024-09-05T10:05:45Z
dc.date.issued 2023-05-30
dc.identifier.citation [1] B. F. Green, A. K. Wolf, C. L. Chomsky, and K. Laughery, “Baseball: An automatic question-answerer,” in IRE-AIEE-ACM ’61 (Western), 1961. [2] P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang, “SQuAD: 100,000+ questions for machine comprehension of text,” in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas: Association for Computational Linguistics, Nov. 2016, pp. 2383–2392. DOI: 10.18653/v1/ D16-1264. [Online]. Available: https://aclanthology.org/D16-1264. [3] Z. Yang, P. Qi, S. Zhang, et al., “HotpotQA: A dataset for diverse, explainable multi-hop question answer ing,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium: Association for Computational Linguistics, Nov. 2018, pp. 2369–2380. DOI: 10.18653/v1/D18- 1259. [Online]. Available: https://aclanthology.org/D18-1259. [4] W. Chen, H. Zha, Z. Chen, W. Xiong, H. Wang, and W. Y. Wang, “HybridQA: A dataset of multi-hop question answering over tabular and textual data,” in Findings of the Association for Computational Linguistics: EMNLP 2020, Online: Association for Computational Linguistics, Nov. 2020, pp. 1026–1036. DOI: 10.18653/v1/2020.findings-emnlp.91. [Online]. Available: https://aclanthology.org/ 2020.findings-emnlp.91. [5] T. Koˇciský, J. Schwarz, P. Blunsom, et al., “The NarrativeQA reading comprehension challenge,” Transac tions of the Association for Computational Linguistics, vol. 6, pp. 317–328, 2018. DOI: 10.1162/tacl_a_ 00023. [Online]. Available: https://aclanthology.org/Q18-1023. [6] D. Khashabi, S. Chaturvedi, M. Roth, S. Upadhyay, and D. Roth, “Looking beyond the surface: A challenge set for reading comprehension over multiple sentences,” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, Louisiana: Association for Computational Linguistics, Jun. 2018, pp. 252–262. DOI: 10.18653/v1/N18-1023. [Online]. Available: https://aclanthology.org/N18- 1023. [7] T. Tahsin Mayeesha, A. Md Sarwar, and R. M. Rahman, “Deep learning based question answering system in bengali,” Journal of Information and Telecommunication, vol. 5, no. 2, pp. 145–178, 2021. [8] P. Rajpurkar, R. Jia, and P. Liang, “Know what you don’t know: Unanswerable questions for SQuAD,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Melbourne, Australia: Association for Computational Linguistics, Jul. 2018, pp. 784–789. DOI: 10.18653/v1/P18-2124. [Online]. Available: https://aclanthology.org/P18-2124. 37 REFERENCES [9] M. A. Haque, S. Sultana, M. J. Islam, M. A. Islam, and J. A. Ovi, “Factoid question answering over bangla comprehension,” in 2020 4th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), 2020, pp. 1–8. DOI: 10.1109/ISMSIT50672.2020.9254680. [10] T. T. Aurpa, R. K. Rifat, M. S. Ahmed, M. M. Anwar, and A. B. M. S. Ali, “Reading comprehension based question answering system in bangla language with transformer-based learning,” Heliyon, vol. 8, no. 10, e11052, 2022, ISSN: 2405-8440. DOI: https://doi.org/10.1016/j.heliyon.2022.e11052. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2405844022023404. [11] A. Bhattacharjee, T. Hasan, W. Ahmad, et al., “BanglaBERT: Language model pretraining and bench marks for low-resource language understanding evaluation in Bangla,” in Findings of the Association for Computational Linguistics: NAACL 2022, Seattle, United States: Association for Computational Lin guistics, Jul. 2022, pp. 1318–1327. DOI: 10.18653/v1/2022.findings-naacl.98. [Online]. Available: https://aclanthology.org/2022.findings-naacl.98. [12] J. H. Clark, E. Choi, M. Collins, et al., “TyDi QA: A benchmark for information-seeking question answering in typologically diverse languages,” Transactions of the Association for Computational Linguistics, vol. 8, pp. 454–470, 2020. DOI: 10.1162/tacl_a_00317. [Online]. Available: https://aclanthology.org/ 2020.tacl-1.30. [13] S. Wang, “Machine comprehension using match-lstm and answer pointer,” Aug. 2016. [14] T. Koˇciský, J. Schwarz, P. Blunsom, et al., “The NarrativeQA Reading Comprehension Challenge,” Transac tions of the Association for Computational Linguistics, vol. 6, pp. 317–328, May 2018, ISSN: 2307-387X. DOI: 10.1162/tacl_a_00023. eprint: https://direct.mit.edu/tacl/article-pdf/doi/10.1162/ tacl\_a\_00023/1567652/tacl\_a\_00023.pdf. [Online]. Available: https://doi.org/10.1162/ tacl%5C_a%5C_00023. [15] Y. Chang, M. Narang, H. Suzuki, G. Cao, J. Gao, and Y. Bisk, “Webqa: Multihop and multimodal qa,” Sep. 2021. [16] J. Chen and G. Durrett, “Understanding dataset design choices for multi-hop reasoning,” arXiv preprint arXiv:1904.12106, 2019. [17] M. Keya, A. K. M. Masum, S. Abujar, B. Majumdar, and S. Hossain, “Bengali question answering system using seq2seq learning based on general knowledge dataset,” Jul. 2020. DOI: 10.1109/ICCCNT49239. 2020.9225605. [18] W. Chen, H. Zha, Z. Chen, W. Xiong, H. Wang, and W. Y. Wang, “HybridQA: A dataset of multi-hop question answering over tabular and textual data,” in Findings of the Association for Computational Linguistics: EMNLP 2020, Online: Association for Computational Linguistics, Nov. 2020, pp. 1026–1036. DOI: 10.18653/v1/2020.findings-emnlp.91. [Online]. Available: https://aclanthology.org/ 2020.findings-emnlp.91. [19] F. Zhu, W. Lei, Y. Huang, et al., “TAT-QA: A question answering benchmark on a hybrid of tabular and textual content in finance,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online: Association for Computational Linguistics, Aug. 2021, pp. 3277–3287. DOI: 10.18653/ v1/2021.acl-long.254. [Online]. Available: https://aclanthology.org/2021.acl-long.254. en_US
dc.identifier.uri http://hdl.handle.net/123456789/2163
dc.description Supervised by Mohammad Anas Jawad, Lecturer, Dr. Abu Raihan Mostofa Kamal, Professor, Department of Computer Science and Engineering(CSE), Islamic University of Technology(IUT), Board Bazar, Gazipur-1704, Bangladesh en_US
dc.description.abstract Bengali is a resource-scare language with a scarcity of quality data sets both in single and multi-hp question answering. In an approach to fill that gap, we want to take a little step by generating a reading comprehension based open-domain multi-hop question answering which will be explainable and diverse. We will generate about 100 passages from news and Wikipedia articles and 500 question-answer pairs. We will maintain the diversity in selecting domains of contexts and also in generating questions and answers. Our data set will be explainable in generating the answer to a given question by providing supporting facts and showing the reasoning chain en_US
dc.language.iso en en_US
dc.publisher Department of Computer Science and Engineering(CSE), Islamic University of Technology(IUT), Board Bazar, Gazipur-1704, Bangladesh en_US
dc.subject QA, QA in Bengali, RC based QA in Bengali, Open-domain QA, Open-domain QA in Bengali, Multi-hop QA, Multi-hop QA in Bengali, Reasoning Chain en_US
dc.title A Diverse and Explainable Multi-hop QA Dataset for Bengali Language en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search IUT Repository


Advanced Search

Browse

My Account

Statistics