Abstract:
Extracting sentiment orientation from texts is known as sentiment analysis or opinion
mining. The evaluation of consumer sentiment through reviews offers valuable insights
into product quality. Entity Level Sentiment Analysis (ELSA) works on the specific
entity of a product e.g. electronic accessories, clothing, food, fashion items, groceries,
sports accessories, etc. Thus, these analyses can infer more specific information related
to a product like marketing strategy development, product quality estimation, and ser vice evaluation, etc. Sentiment analysis has been extensively studied in popular lan guages like English, Arabic, French, Chinese, etc. However, the Bangla language,
which ranks as the sixth most widely spoken language globally, has received relatively
less attention in this area. This limited focus can be attributed to the scarcity of rele vant data and challenges related to cross-domain adaptability, resulting in a small num ber of works available for Bangla sentiment analysis. Entity Level Sentiment Analysis
(ELSA) is the sentiment extracted from a specific entity of the text. To date, no studies
have been conducted on ELSA in Bangla text. To address this gap, we present an entity level sentiment analysis conducted on a dataset of 10,000 reviews consisting of book
reviews, women’s clothing reviews, and health product reviews in Bangla text. The
dataset comprises 300 book reviews collected from online bookshops namely Roko mari and Wafilife and 9700 samples of women’s clothing and health product reviews
collected from the Daraz online e-commerce site. The reviews are manually annotated
for entity identification with their corresponding sentiment by 2 native annotators. The
sentiment is categorized into three main groups: positive, negative, and neutral. The
inter-rater reliability (IRR) between the annotators is performed using Cohen Kappa’s
score. To establish baselines we used pre-trained language models. For product entity
identification, we used mbert-bengali-ner language model and for sentiment analysis,
we used the bangla-bert-base language model. The results of our proposed methodol ogy for Named Entity Recognition (NER) and Sentiment Analysis (SA) are promising.
The F1-score of NER is 87.91% and SA is 86.74% respectively. Our data collection
web crawler code1
, constructed data2
and baseline analysis code3
are publicly available.
Description:
Supervised by
Dr. Hasan Mahmud,
Associate Professor,
Department of Computer Science and Engineering (CSE)
Islamic University of Technology (IUT)
Board Bazar, Gazipur, Bangladesh
This thesis is submitted in partial fulfillment of the requirement for the degree of Master of Science in Computer Science and Engineering, 2024