Abstract:
Named Entity Recognition (NER) is a critical task in Natural Language Pro-
cessing (NLP) for low-resource languages like Bengali. This paper explores
few-shot learning for Bengali NER using ProtoBERT, NN-SHOT, and Struct-
SHOT models with mBERT, XLM-RoBERTa, and BanglaBERT embeddings.
We addressed class imbalance in the BNER dataset through oversampling tech-
niques, significantly enhancing model performance. In our 5-way 5-shot ex-
periment, we observed that XLM-RoBERTa generally yielded the highest F1
scores: 0.4045 for ProtoBERT, 0.378 for NN-SHOT, and 0.3912 for StructSHOT.
This study underscores the importance of balanced datasets and suggests future
research on optimizing sampling strategies and advanced model architectures.
Description:
Supervised by
Dr. Hasan Mahmud,
Associate Professor,
Department of Computer Science and Engineering (CSE)
Islamic University of Technology (IUT)
Board Bazar, Gazipur, Bangladesh
This thesis is submitted in partial fulfillment of the requirement for the degree of Bachelor of Science in Computer Science and Engineering, 2024