Abstract:
Information retrieval (IR) for Bangla text has received relatively little attention de spite the widespread global usage of the language. The rich morphology and lack
of capitalization for Bangla presents challenges for direct application of standard IR
models developed predominantly for English text. This report explores the gradual
development of IR methods for text retrieval including works on Bangla texts. The
unsupervised nature of the methods used for Bangla text retrieval makes the methods
unsuitable for specific domains. To mitigate the problems faced due to lack of domain specific training, different modern neural information retrieval techniques need to be
explored that can handle different data availability scenarios. In this report, we exper iment with different neural information retrieval techniques on different percentage
of available data and provide guidelines on building information retrieval pipelines
for Bangla language. We also introduce a dataset containing rice-related scientific
texts along with human annotated questions, which we used to train and evaluate the
performance of domain-specific neural information retrieval architectures.
Description:
Supervised by
Mr. Md. Mohsinul Kabir,
Assistant Professor,
Dr. Hasan Mahmud,
Associate Professor,
Dr. Md. Kamrul Hasan,
Professor,
Department of Computer Science and Engineering (CSE)
Islamic University of Technology (IUT)
Board Bazar, Gazipur, Bangladesh
This thesis is submitted in partial fulfillment of the requirement for the degree of Bachelor of Science in Computer Science and Engineering, 2024