Leveraging RLHF with Instruction Fine-tuning for Improving LLM Response in Bangla Conversations

Rahman, Tahmid; Mahmud, Shahriar; Nasrum, Nur

Leveraging RLHF with Instruction Fine-tuning for Improving LLM Response in Bangla Conversations

Rahman, Tahmid; Mahmud, Shahriar; Nasrum, Nur

URI: http://hdl.handle.net/123456789/2414

Date: 2024-11-30

Abstract:

In the realm of Bangla conversational agents, this research endeavors to elevate the responsiveness of Large Language Models (LLMs) through the synergistic ap plication of Reinforcement Learning from Human Feedback (RLHF) and instruc tion fine-tuning. The primary objectives encompass the creation of a domain specific Bangla conversational dataset, an evaluation of existing LLMs using an instruction-tuned dataset, and the introduction of a novel human-centric bench marking framework. This work uses a multi-step process to improve the effi cacy of Large Language Models (LLMs) in the context of Bangla Conversational Agents. The technique consists of many key steps, each of which contributes to the refining and optimization of model performance. To address the shortage of domain-specific datasets for Bangla Conversational Agents, we start by construct ing a Bangla Conversational Dataset. The dataset is then fine-tuned with the use of an Instruction-Tuned Format. This structuring makes the data more suitable for training language models, allowing them to better comprehend and respond to precise commands in the Bangla conversational environment. Existing LLMs go through the next phase of our procedure, Supervised Fine-Tuning (SFT), us ing the instruction-tuned dataset. This fine-tuning procedure guarantees that the models are tailored to the variety and complexity of Bangla talks, maximizing their performance in accordance with the dataset’s unique instructions. Following fine-tuning, we do a detailed examination and comparison of the LLM models. This stage gives insight into the effectiveness of the fine-tuned models and enables the selection of the most promising candidate. This iterative procedure entails modifying the model with human feedback to improve its performance in a more dynamic and sophisticated way.

Description:

Supervised by Dr. Hasan Mahmud, Associate Professor, Dr. Md. Kamrul Hasan, Professor, Department of Computer Science and Engineering (CSE) Islamic University of Technology (IUT) Board Bazar, Gazipur, Bangladesh This thesis is submitted in partial fulfillment of the requirement for the degree of Bachelor of Science in Computer Science and Engineering, 2024

Show full item record