Leveraging RLHF with Instruction Fine-tuning for Improving LLM Response in Bangla Conversations

Rahman, Tahmid; Mahmud, Shahriar; Nasrum, Nur

dc.contributor.author	Rahman, Tahmid
dc.contributor.author	Mahmud, Shahriar
dc.contributor.author	Nasrum, Nur
dc.date.accessioned	2025-06-03T04:57:36Z
dc.date.available	2025-06-03T04:57:36Z
dc.date.issued	2024-11-30
dc.identifier.citation	[1] J. M. Liu, D. Li, H. Cao, T. Ren, Z. Liao, and J. Wu, “Chatcounselor: A large language models for mental health support,” arXiv preprint arXiv:2309.15461, 2023. [2] D. Bill and T. Eriksson, “Fine-tuning a llm using reinforcement learning from human feedback for a therapy chatbot application,” 2023. [3] S. Zhang, L. Dong, X. Li, S. Zhang, X. Sun, S. Wang, J. Li, R. Hu, T. Zhang, F. Wu et al., “Instruction tuning for large language models: A survey,” arXiv preprint arXiv:2308.10792, 2023.	en_US
dc.identifier.uri	http://hdl.handle.net/123456789/2414
dc.description	Supervised by Dr. Hasan Mahmud, Associate Professor, Dr. Md. Kamrul Hasan, Professor, Department of Computer Science and Engineering (CSE) Islamic University of Technology (IUT) Board Bazar, Gazipur, Bangladesh This thesis is submitted in partial fulfillment of the requirement for the degree of Bachelor of Science in Computer Science and Engineering, 2024	en_US
dc.description.abstract	In the realm of Bangla conversational agents, this research endeavors to elevate the responsiveness of Large Language Models (LLMs) through the synergistic ap plication of Reinforcement Learning from Human Feedback (RLHF) and instruc tion fine-tuning. The primary objectives encompass the creation of a domain specific Bangla conversational dataset, an evaluation of existing LLMs using an instruction-tuned dataset, and the introduction of a novel human-centric bench marking framework. This work uses a multi-step process to improve the effi cacy of Large Language Models (LLMs) in the context of Bangla Conversational Agents. The technique consists of many key steps, each of which contributes to the refining and optimization of model performance. To address the shortage of domain-specific datasets for Bangla Conversational Agents, we start by construct ing a Bangla Conversational Dataset. The dataset is then fine-tuned with the use of an Instruction-Tuned Format. This structuring makes the data more suitable for training language models, allowing them to better comprehend and respond to precise commands in the Bangla conversational environment. Existing LLMs go through the next phase of our procedure, Supervised Fine-Tuning (SFT), us ing the instruction-tuned dataset. This fine-tuning procedure guarantees that the models are tailored to the variety and complexity of Bangla talks, maximizing their performance in accordance with the dataset’s unique instructions. Following fine-tuning, we do a detailed examination and comparison of the LLM models. This stage gives insight into the effectiveness of the fine-tuned models and enables the selection of the most promising candidate. This iterative procedure entails modifying the model with human feedback to improve its performance in a more dynamic and sophisticated way.	en_US
dc.language.iso	en	en_US
dc.publisher	Department of Computer Science and Engineering(CSE), Islamic University of Technology(IUT), Board Bazar, Gazipur-1704, Bangladesh	en_US
dc.subject	nt Learning from Hu man Feedback (RLHF), Instruction fine-tuning,Human-centric benchmarking,Supervised Fine-Tuning (SFT)	en_US
dc.title	Leveraging RLHF with Instruction Fine-tuning for Improving LLM Response in Bangla Conversations	en_US
dc.type	Thesis	en_US