Entity Level Sentiment Analysis from Online Bangla Reviews

Mahfuz, Obayed Bin

Entity Level Sentiment Analysis from Online Bangla Reviews

Mahfuz, Obayed Bin

URI: http://hdl.handle.net/123456789/2397

Date: 2024-01-22

Abstract:

Extracting sentiment orientation from texts is known as sentiment analysis or opinion mining. The evaluation of consumer sentiment through reviews offers valuable insights into product quality. Entity Level Sentiment Analysis (ELSA) works on the specific entity of a product e.g. electronic accessories, clothing, food, fashion items, groceries, sports accessories, etc. Thus, these analyses can infer more specific information related to a product like marketing strategy development, product quality estimation, and ser vice evaluation, etc. Sentiment analysis has been extensively studied in popular lan guages like English, Arabic, French, Chinese, etc. However, the Bangla language, which ranks as the sixth most widely spoken language globally, has received relatively less attention in this area. This limited focus can be attributed to the scarcity of rele vant data and challenges related to cross-domain adaptability, resulting in a small num ber of works available for Bangla sentiment analysis. Entity Level Sentiment Analysis (ELSA) is the sentiment extracted from a specific entity of the text. To date, no studies have been conducted on ELSA in Bangla text. To address this gap, we present an entity level sentiment analysis conducted on a dataset of 10,000 reviews consisting of book reviews, women’s clothing reviews, and health product reviews in Bangla text. The dataset comprises 300 book reviews collected from online bookshops namely Roko mari and Wafilife and 9700 samples of women’s clothing and health product reviews collected from the Daraz online e-commerce site. The reviews are manually annotated for entity identification with their corresponding sentiment by 2 native annotators. The sentiment is categorized into three main groups: positive, negative, and neutral. The inter-rater reliability (IRR) between the annotators is performed using Cohen Kappa’s score. To establish baselines we used pre-trained language models. For product entity identification, we used mbert-bengali-ner language model and for sentiment analysis, we used the bangla-bert-base language model. The results of our proposed methodol ogy for Named Entity Recognition (NER) and Sentiment Analysis (SA) are promising. The F1-score of NER is 87.91% and SA is 86.74% respectively. Our data collection web crawler code1 , constructed data2 and baseline analysis code3 are publicly available.

Description:

Supervised by Dr. Hasan Mahmud, Associate Professor, Department of Computer Science and Engineering (CSE) Islamic University of Technology (IUT) Board Bazar, Gazipur, Bangladesh This thesis is submitted in partial fulfillment of the requirement for the degree of Master of Science in Computer Science and Engineering, 2024

Show full item record