Leveraging RLHF with Instruction Fine-tuning for Improving LLM Response in Bangla Conversations

Show simple item record

dc.contributor.author Rahman, Tahmid
dc.contributor.author Mahmud, Shahriar
dc.contributor.author Nasrum, Nur
dc.date.accessioned 2025-06-03T04:57:36Z
dc.date.available 2025-06-03T04:57:36Z
dc.date.issued 2024-11-30
dc.identifier.citation [1] J. M. Liu, D. Li, H. Cao, T. Ren, Z. Liao, and J. Wu, “Chatcounselor: A large language models for mental health support,” arXiv preprint arXiv:2309.15461, 2023. [2] D. Bill and T. Eriksson, “Fine-tuning a llm using reinforcement learning from human feedback for a therapy chatbot application,” 2023. [3] S. Zhang, L. Dong, X. Li, S. Zhang, X. Sun, S. Wang, J. Li, R. Hu, T. Zhang, F. Wu et al., “Instruction tuning for large language models: A survey,” arXiv preprint arXiv:2308.10792, 2023. en_US
dc.identifier.uri http://hdl.handle.net/123456789/2414
dc.description Supervised by Dr. Hasan Mahmud, Associate Professor, Dr. Md. Kamrul Hasan, Professor, Department of Computer Science and Engineering (CSE) Islamic University of Technology (IUT) Board Bazar, Gazipur, Bangladesh This thesis is submitted in partial fulfillment of the requirement for the degree of Bachelor of Science in Computer Science and Engineering, 2024 en_US
dc.description.abstract In the realm of Bangla conversational agents, this research endeavors to elevate the responsiveness of Large Language Models (LLMs) through the synergistic ap plication of Reinforcement Learning from Human Feedback (RLHF) and instruc tion fine-tuning. The primary objectives encompass the creation of a domain specific Bangla conversational dataset, an evaluation of existing LLMs using an instruction-tuned dataset, and the introduction of a novel human-centric bench marking framework. This work uses a multi-step process to improve the effi cacy of Large Language Models (LLMs) in the context of Bangla Conversational Agents. The technique consists of many key steps, each of which contributes to the refining and optimization of model performance. To address the shortage of domain-specific datasets for Bangla Conversational Agents, we start by construct ing a Bangla Conversational Dataset. The dataset is then fine-tuned with the use of an Instruction-Tuned Format. This structuring makes the data more suitable for training language models, allowing them to better comprehend and respond to precise commands in the Bangla conversational environment. Existing LLMs go through the next phase of our procedure, Supervised Fine-Tuning (SFT), us ing the instruction-tuned dataset. This fine-tuning procedure guarantees that the models are tailored to the variety and complexity of Bangla talks, maximizing their performance in accordance with the dataset’s unique instructions. Following fine-tuning, we do a detailed examination and comparison of the LLM models. This stage gives insight into the effectiveness of the fine-tuned models and enables the selection of the most promising candidate. This iterative procedure entails modifying the model with human feedback to improve its performance in a more dynamic and sophisticated way. en_US
dc.language.iso en en_US
dc.publisher Department of Computer Science and Engineering(CSE), Islamic University of Technology(IUT), Board Bazar, Gazipur-1704, Bangladesh en_US
dc.subject nt Learning from Hu man Feedback (RLHF), Instruction fine-tuning,Human-centric benchmarking,Supervised Fine-Tuning (SFT) en_US
dc.title Leveraging RLHF with Instruction Fine-tuning for Improving LLM Response in Bangla Conversations en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search IUT Repository


Advanced Search

Browse

My Account

Statistics