Abstract:
With the vast amount of information available on the Internet, finding answers to questions is as important as ever in today’s day and age. In Natural Language Processing Research, Question Answering (QA) and Query-based Text Summarization (QBSUM) are there to tackle this challenge. However, most of the work being done neglects low resource languages such as Bangla, resulting in the small number of quality datasets available in the literature. Therefore to address this research gap, in this work, we propose a semi-automated methodology for generating a Bangla dataset with Natural Questions for three tasks - Question Answering (QA), Query-based Single Document Text Summarization (SD-QBSUM) and Query-based Multi-Document Text Summarization (MD-QBSUM). We then provide baselines for this dataset on those tasks and also compare our dataset with existing ones on various metrics.
Description:
Supervised by
Dr. Kamrul Hasan,
Professor,
Department of Computer Science and Engineering(CSE),
Islamic University of Technology (IUT)
Board Bazar, Gazipur-1704, Bangladesh.
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2022.