Abstract:
In the realm of Bangla conversational agents, this research endeavors to elevate
the responsiveness of Large Language Models (LLMs) through the synergistic ap plication of Reinforcement Learning from Human Feedback (RLHF) and instruc tion fine-tuning. The primary objectives encompass the creation of a domain specific Bangla conversational dataset, an evaluation of existing LLMs using an
instruction-tuned dataset, and the introduction of a novel human-centric bench marking framework. This work uses a multi-step process to improve the effi cacy of Large Language Models (LLMs) in the context of Bangla Conversational
Agents. The technique consists of many key steps, each of which contributes to
the refining and optimization of model performance. To address the shortage of
domain-specific datasets for Bangla Conversational Agents, we start by construct ing a Bangla Conversational Dataset. The dataset is then fine-tuned with the use
of an Instruction-Tuned Format. This structuring makes the data more suitable
for training language models, allowing them to better comprehend and respond
to precise commands in the Bangla conversational environment. Existing LLMs
go through the next phase of our procedure, Supervised Fine-Tuning (SFT), us ing the instruction-tuned dataset. This fine-tuning procedure guarantees that the
models are tailored to the variety and complexity of Bangla talks, maximizing
their performance in accordance with the dataset’s unique instructions. Following
fine-tuning, we do a detailed examination and comparison of the LLM models.
This stage gives insight into the effectiveness of the fine-tuned models and enables
the selection of the most promising candidate. This iterative procedure entails
modifying the model with human feedback to improve its performance in a more
dynamic and sophisticated way.
Description:
Supervised by
Dr. Hasan Mahmud,
Associate Professor,
Dr. Md. Kamrul Hasan,
Professor,
Department of Computer Science and Engineering (CSE)
Islamic University of Technology (IUT)
Board Bazar, Gazipur, Bangladesh
This thesis is submitted in partial fulfillment of the requirement for the degree of Bachelor of Science in Computer Science and Engineering, 2024