A model agnostic explainable approach for detecting Cyber bullying in Bangla language using transformer based models

Show simple item record

dc.contributor.author Nobo, Takia Mosharref
dc.contributor.author Galib, Mostafa
dc.contributor.author Rabib, Hasnain Karim
dc.date.accessioned 2023-01-27T05:06:59Z
dc.date.available 2023-01-27T05:06:59Z
dc.date.issued 2022-05-30
dc.identifier.citation [1] R. M. Kowalski and G. W. Giumetti, “Bullying in the digital age,” in Cybercrime and its victims, pp. 167–186, Routledge, 2017. [2] V. Kumar and P. Nanda, “Social media in higher education: A framework for continuous engagement,” International Journal of Information and Communication Technology Education (IJICTE), vol. 15, no. 1, pp. 97–108, 2019. [3] M. ElSherief, V. Kulkarni, D. Nguyen, W. Y. Wang, and E. Belding, “Hate lingo: A target-based linguistic analysis of hate speech in social media,” in Proceedings of the International AAAI Conference on Web and Social Media, vol. 12, 2018. [4] R. Kshirsagar, T. Cukuvac, K. McKeown, and S. McGregor, “Predictive embeddings for hate speech detection on twitter,” arXiv preprint arXiv:1809.10644, 2018. [5] M. S. Jahan and M. Oussalah, “A systematic review of hate speech automatic detection using natural language processing,” arXiv preprint arXiv:2106.00742, 2021. [6] Z. Zhang, D. Robinson, and J. Tepper, “Detecting hate speech on twitter using a convolution-gru based deep neural network,” in European semantic web conference, pp. 745–760, Springer, 2018. [7] “Youths call for continued guidance to tackle online bullying amid increased internet use,” [8] C. L. Nixon, “Current perspectives: the impact of cyberbullying on adolescent health,” Adolescent health, medicine and therapeutics, vol. 5, p. 143, 2014. [9] “https://www.findlaw.com/criminal/criminal-charges/cyber-bullying.html,” [10] P. Chakraborty and M. H. Seddiqui, “Threat and abusive language detection on social media in bengali language,” in 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT), pp. 1–6, IEEE, 2019. [11] S. Tomkins, L. Getoor, Y. Chen, and Y. Zhang, “A socio-linguistic model for cyberbullying detection,” in 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 53–60, IEEE, 2018. 44 [12] M. R. Karim, S. K. Dey, T. Islam, S. Sarker, M. H. Menon, K. Hossain, M. A. Hossain, and S. Decker, “Deephateexplainer: Explainable hate speech detection in under-resourced bengali language,” in 2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA), pp. 1–10, IEEE, 2021. [13] “What is cyberbullying by unicef,” [14] “What falls under cyberbullying and the criteria for cyberbullying,” [15] “What is cyber bullying?,” [16] “psychological analysis of cyberbullying,” [17] S. Paul and S. Saha, “Cyberbert: Bert for cyberbullying identification,” Multimedia Systems, pp. 1–8, 2020. [18] F. Elsafoury, S. Katsigiannis, S. R. Wilson, and N. Ramzan, “Does bert pay attention to cyberbullying?,” in Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1900–1904, 2021. [19] L. Bacco, A. Cimino, F. Dell’Orletta, and M. Merone, “Explainable sentiment analysis: A hierarchical transformer-based extractive summarization approach,” Electronics, vol. 10, no. 18, p. 2195, 2021. [20] I. Guellil, A. Adeel, F. Azouaou, F. Benali, A.-E. Hachani, K. Dashtipour, M. Gogate, C. Ieracitano, R. Kashani, and A. Hussain, “A semi-supervised approach for sentiment analysis of arab (ic+ izi) messages: Application to the algerian dialect,” SN Computer Science, vol. 2, no. 2, pp. 1–18, 2021. [21] S. Alsafari, S. Sadaoui, and M. Mouhoub, “Effect of word embedding models on hate and offensive speech detection,” arXiv preprint arXiv:2012.07534, 2020. [22] H. Faris, I. Aljarah, M. Habib, and P. A. Castillo, “Hate speech detection using word embedding and deep learning in the arabic language context.,” in ICPRAM, pp. 453–460, 2020. [23] I. Alfina, R. Mulia, M. I. Fanany, and Y. Ekanata, “Hate speech detection in the indonesian language: A dataset and preliminary study,” in 2017 International Confer45 ence on Advanced Computer Science and Information Systems (ICACSIS), pp. 233– 238, IEEE, 2017. [24] N. I. Pratiwi, I. Budi, and I. Alfina, “Hate speech detection on indonesian instagram comments using fasttext approach,” in 2018 International Conference on Advanced Computer Science and Information Systems (ICACSIS), pp. 447–450, IEEE, 2018. [25] M. A. Fauzi, “Random forest approach fo sentiment analysis in indonesian,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 12, no. 1, pp. 46–50, 2018. [26] A. Bohra, D. Vijay, V. Singh, S. S. Akhtar, and M. Shrivastava, “A dataset of hindienglish code-mixed social media text for hate speech detection,” in Proceedings of the second workshop on computational modeling of people’s opinions, personality, and emotions in social media, pp. 36–41, 2018. [27] T. Santosh and K. Aravind, “Hate speech detection in hindi-english code-mixed social media text,” in Proceedings of the ACM India joint international conference on data science and management of data, pp. 310–313, 2019. [28] S. Tarwani, M. Jethanandani, and V. Kant, “Cyberbullying detection in hindi-english code-mixed language using sentiment classification,” in International conference on advances in computing and data sciences, pp. 543–551, Springer, 2019. [29] S. Akhter et al., “Social media bullying detection using machine learning on bangla text,” in 2018 10th International Conference on Electrical and Computer Engineering (ICECE), pp. 385–388, IEEE, 2018. [30] A. M. Ishmam and S. Sharmin, “Hateful speech detection in public facebook pages for the bengali language,” in 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), pp. 555–560, IEEE, 2019. [31] R. Kumar, B. Lahiri, and A. K. Ojha, “Aggressive and offensive language identification in hindi, bangla, and english: A comparative study,” SN Computer Science, vol. 2, no. 1, pp. 1–20, 2021. 46 [32] N. Romim, M. Ahmed, H. Talukder, S. Islam, et al., “Hate speech detection in the bengali language: A dataset and its baseline evaluation,” in Proceedings of International Joint Conference on Advances in Computational Intelligence, pp. 457–468, Springer, 2021. [33] A. K. Das, A. Al Asif, A. Paul, and M. N. Hossain, “Bangla hate speech detection on social media using attention-based recurrent neural network,” Journal of Intelligent Systems, vol. 30, no. 1, pp. 578–591, 2021. [34] T. Ranasinghe and M. Zampieri, “Multilingual offensive language identification for low-resource languages,” Transactions on Asian and Low-Resource Language Information Processing, vol. 21, no. 1, pp. 1–13, 2021. [35] S. Sazzed, “Abusive content detection in transliterated bengali-english social media corpus,” in Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching, pp. 125–130, 2021. [36] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018. [37] V. Sanh, L. Debut, J. Chaumond, and T. Wolf, “Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter,” arXiv preprint arXiv:1910.01108, 2019. [38] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le, “Xlnet: Generalized autoregressive pretraining for language understanding,” Advances in neural information processing systems, vol. 32, 2019. en_US
dc.identifier.uri http://hdl.handle.net/123456789/1665
dc.description Supervised by Dr. Md. Azam Hossain, Assistant Professor, Department of Computer Science and Engineering(CSE), Islamic University of Technology(IUT), Board Bazar, Gazipur-1704. Bangladesh. This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2022. en_US
dc.description.abstract Almost every facet of social communication has changed as a result of the exponential growth of social media platforms usage. Meanwhile, evidence is accumulating that the rising usage of social networks in the digital realm has given rise to an unsettling problem that has resurrected in new contexts: cyberbullying. The majority of current cyberbullying detection research focuses on English texts. On the other hand, while being spoken by 230 million people globally and being rich in diversity, the Bengali language is under-resourced for natural language processing (NLP). Recently, there has been an alarming surge in the number of incidences of gender-based discrimination or sexual harassment expressed on social media sites. In this study, we presented the cyberbullying detection under different categories in low-resourced Bangla language using transformer based models. We created our own dataset on gender discrimination and appended it to another open-source Bangla dataset with 4 classes. In our proposed approach, we used five different models to train our augmented dataset, followed by an ensembling technique on those five models.Then we make the models explainable using model agnostic approaches. Finally, we compared the individual prediction accuracies with the ensembled prediction accuracies. While training the dataset, we followed the stratified k-fold cross validation technique. Our evaluations yield up to an Accuracy of 75% in cyberbullying detection on emsembling. en_US
dc.language.iso en en_US
dc.publisher Department of Computer Science and Engineering(CSE), Islamic University of Technology(IUT), Board Bazar, Gazipur-1704, Bangladesh en_US
dc.subject Cyberbullying, Transformer-based, Explainability, Bangla-text en_US
dc.title A model agnostic explainable approach for detecting Cyber bullying in Bangla language using transformer based models en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search IUT Repository


Advanced Search

Browse

My Account

Statistics