Abstract:
Almost every facet of social communication has changed as a result of the exponential growth of social media platforms usage. Meanwhile, evidence is accumulating
that the rising usage of social networks in the digital realm has given rise to an unsettling problem that has resurrected in new contexts: cyberbullying. The majority
of current cyberbullying detection research focuses on English texts. On the other
hand, while being spoken by 230 million people globally and being rich in diversity,
the Bengali language is under-resourced for natural language processing (NLP). Recently, there has been an alarming surge in the number of incidences of gender-based
discrimination or sexual harassment expressed on social media sites. In this study,
we presented the cyberbullying detection under different categories in low-resourced
Bangla language using transformer based models. We created our own dataset on
gender discrimination and appended it to another open-source Bangla dataset with
4 classes. In our proposed approach, we used five different models to train our
augmented dataset, followed by an ensembling technique on those five models.Then
we make the models explainable using model agnostic approaches. Finally, we compared the individual prediction accuracies with the ensembled prediction accuracies.
While training the dataset, we followed the stratified k-fold cross validation technique. Our evaluations yield up to an Accuracy of 75% in cyberbullying detection
on emsembling.
Description:
Supervised by
Dr. Md. Azam Hossain,
Assistant Professor,
Department of Computer Science and Engineering(CSE),
Islamic University of Technology(IUT),
Board Bazar, Gazipur-1704. Bangladesh.
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2022.