Misogyny Detection in Social Media for Under-Resourced Bangla Language

Show simple item record

dc.contributor.author Kader, Md. Wasif
dc.contributor.author Jamil, Chowdhury Farhan
dc.contributor.author Abir, Md. Tanvir Hasan
dc.date.accessioned 2024-08-29T05:38:56Z
dc.date.available 2024-08-29T05:38:56Z
dc.date.issued 2023-05-30
dc.identifier.citation [1] R. M. Kowalski, G. W. Giumetti, A. N. Schroeder, and M. R. Lattanner, “Bullying in the digital age: a critical review and meta-analysis of cyberbul lying research among youth.” Psychological bulletin, vol. 140, no. 4, p. 1073, 2014. [2] M. ElSherief, V. Kulkarni, D. Nguyen, W. Y. Wang, and E. Belding, “Hate lingo: A target-based linguistic analysis of hate speech in social media,” in Proceedings of the International AAAI Conference on Web and Social Media, vol. 12, no. 1, 2018. [3] M. R. Karim, S. K. Dey, T. Islam, S. Sarker, M. H. Menon, K. Hossain, M. A. Hossain, and S. Decker, “Deephateexplainer: Explainable hate speech detec tion in under-resourced bengali language,” in 2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA). IEEE, 2021, pp. 1–10. [4] Z. Waseem and D. Hovy, “Hateful symbols or hateful people? predictive features for hate speech detection on twitter,” in Proceedings of the NAACL student research workshop, 2016, pp. 88–93. [5] T. Davidson, D. Warmsley, M. Macy, and I. Weber, “Automated hate speech detection and the problem of offensive language,” in Proceedings of the inter national AAAI conference on web and social media, vol. 11, no. 1, 2017, pp. 512–515. [6] C. Nobata, J. Tetreault, A. Thomas, Y. Mehdad, and Y. Chang, “Abusive language detection in online user content,” in Proceedings of the 25th inter national conference on world wide web, 2016, pp. 145–153. [7] P. Chakraborty and M. H. Seddiqui, “Threat and abusive language detection on social media in bengali language,” in 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT). IEEE, 2019, pp. 1–6. 36 Bibliography 37 [8] M. A. Khan, M. R. Karim, and Y. Kim, “A two-stage big data analytics framework with real world applications using spark machine learning and long short-term memory network,” Symmetry, vol. 10, no. 10, p. 485, 2018. [9] B. R. Chakravarthi, M. Arcan, and J. P. McCrae, “Improving wordnets for under-resourced languages using machine translation,” in Proceedings of the 9th Global Wordnet Conference, 2018, pp. 77–86. [10] Z. Zhang and L. Luo, “Hate speech detection: A solved problem? the chal lenging case of long tail on twitter,” Semantic Web, vol. 10, no. 5, pp. 925–945, 2019. [11] A. Ben-David and A. M. Fern´andez, “Hate speech and covert discrimination on social media: Monitoring the facebook pages of extreme-right political parties in spain,” International Journal of Communication, vol. 10, p. 27, 2016. [12] T. Davidson, D. Warmsley, M. Macy, and I. Weber, “Automated hate speech detection and the problem of offensive language,” in Proceedings of the inter national AAAI conference on web and social media, vol. 11, no. 1, 2017, pp. 512–515. [13] N. Sambasivan, A. Batool, N. Ahmed, T. Matthews, K. Thomas, L. S. Gayt´an-Lugo, D. Nemer, E. Bursztein, E. Churchill, and S. Consolvo, “” they don’t leave us alone anywhere we go” gender and digital abuse in south asia,” in proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, 2019, pp. 1–14. [14] B. R. Chakravarthi, M. Arcan, and J. P. McCrae, “Improving wordnets for under-resourced languages using machine translation,” in Proceedings of the 9th Global Wordnet Conference, 2018, pp. 77–86. [15] M. Mondal, L. A. Silva, and F. Benevenuto, “A measurement study of hate speech in social media,” in Proceedings of the 28th ACM conference on hy pertext and social media, 2017, pp. 85–94. [16] A. Schmidt and M. Wiegand, “A survey on hate speech detection using nat ural language processing,” in Proceedings of the fifth international workshop on natural language processing for social media, 2017, pp. 1–10. [17] T. Beran and Q. Li, “Cyber-harassment: A study of a new method for an old behavior,” Journal of educational computing research, vol. 32, no. 3, p. 265, 2005. Bibliography 38 [18] J. T. Nockleby, “Hate speech,” Encyclopedia of the American constitution, vol. 3, no. 2, pp. 1277–1279, 2000. [19] Z. Waseem and D. Hovy, “Hateful symbols or hateful people? predictive features for hate speech detection on twitter,” in Proceedings of the NAACL student research workshop, 2016, pp. 88–93. [20] T. Davidson, D. Warmsley, M. Macy, and I. Weber, “Automated hate speech detection and the problem of offensive language,” in Proceedings of the inter national AAAI conference on web and social media, vol. 11, no. 1, 2017, pp. 512–515. [21] P. Badjatiya, S. Gupta, M. Gupta, and V. Varma, “Deep learning for hate speech detection in tweets,” in Proceedings of the 26th international confer ence on World Wide Web companion, 2017, pp. 759–760. [22] P. Fortuna and S. Nunes, “A survey on automatic detection of hate speech in text,” ACM Computing Surveys (CSUR), vol. 51, no. 4, pp. 1–30, 2018. [23] J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation,” in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532–1543. [24] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013. [25] A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, “Bag of tricks for efficient text classification,” arXiv preprint arXiv:1607.01759, 2016. [26] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018. [27] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural com putation, vol. 9, no. 8, pp. 1735–1780, 1997. [28] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018. [29] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural com putation, vol. 9, no. 8, pp. 1735–1780, 1997. Bibliography 39 [30] M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural networks,” IEEE transactions on Signal Processing, vol. 45, no. 11, pp. 2673–2681, 1997. [31] T. Davidson, D. Warmsley, M. Macy, and I. Weber, “Automated hate speech detection and the problem of offensive language,” in Proceedings of the inter national AAAI conference on web and social media, vol. 11, no. 1, 2017, pp. 512–515. [32] M. R. Karim, S. K. Dey, T. Islam, S. Sarker, M. H. Menon, K. Hossain, M. A. Hossain, and S. Decker, “Deephateexplainer: Explainable hate speech detec tion in under-resourced bengali language,” in 2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA). IEEE, 2021, pp. 1–10. [33] N. S. Samghabadi, P. Patwa, S. Pykl, P. Mukherjee, A. Das, and T. Solorio, “Aggression and misogyny detection using bert: A multi-task approach,” in Proceedings of the second workshop on trolling, aggression and cyberbullying, 2020, pp. 126–131. [34] M. R. Karim, B. R. Chakravarthi, J. P. McCrae, and M. Cochez, “Classifica tion benchmarks for under-resourced bengali language based on multichannel convolutional-lstm network,” in 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA). IEEE, 2020, pp. 390–399. [35] M. Rezaul Karim, B. Raja Chakravarthi, J. P. McCrae, and M. Cochez, “Clas sification benchmarks for under-resourced bengali language based on multi channel convolutional-lstm network,” arXiv e-prints, pp. arXiv–2004, 2020. [36] N. Garneau, M. Hartmann, A. Sandholm, S. Ruder, I. Vuli´c, and A. Søgaard, “Analogy training multilingual encoders,” in Proceedings of the AAAI Con ference on Artificial Intelligence, vol. 35, no. 14, 2021, pp. 12 884–12 892. [37] K. I. Islam, S. Kar, M. S. Islam, and M. R. Amin, “Sentnob: A dataset for analysing sentiment on noisy bangla texts,” in Findings of the Association for Computational Linguistics: EMNLP 2021, 2021, pp. 3265–3271. [38] M. Kabir, O. B. Mahfuz, S. R. Raiyan, H. Mahmud, and M. K. Hasan, “Banglabook: A large-scale bangla dataset for sentiment analysis from book reviews,” arXiv preprint arXiv:2305.06595, 2023 en_US
dc.identifier.uri http://hdl.handle.net/123456789/2140
dc.description Supervised by Dr. Hasan Mahmud, Associate Professor, Md. Mohsinul Kabir, Assistant Professor, Dr. Md. Kamrul Hasan Professor, Department of Computer Science and Engineering(CSE), Islamic University of Technology(IUT), Board Bazar, Gazipur-1704, Bangladesh en_US
dc.description.abstract This study presents a new strategy based on Natural Language Processing (NLP) techniques for detecting and mitigating misogyny on social media. In this study a dataset was constructed of 3.8 million instances of hate speech from various social media networks that were collected meticulously. Advances in this research are substantially hampered by the lack of a sizable Bengali dataset for the detection of hate speech and sexism in Bengali language texts, making it difficult to effectively identify and address these problems. To improve the representation of hate speech in the dataset, an embedding model based on informal FastText is presented, which captures the complex semantics of hate speech more accurately than other meth ods. This improved word embedding model is incorporated into a Bidirectional Long Short-Term Memory (BiLSTM) architecture in order to identify contextual dependencies and sequential patterns within hate speech comments. The model’s layers are trained to encode and comprehend sequential information while tak ing both preceding and subsequent context into account, enabling it to better comprehend remarks and their context. The proposed methodology is evaluated exhaustively on a meticulously annotated dataset, allowing for a thorough anal ysis of its performance. Measurements of precision, recall, and F1-score are used to evaluate the accuracy and effectiveness of hate speech detection. The results demonstrate the framework’s superior performance and discrimination capabili ties, validating its capacity to accurately identify and categorize instances of hate speech. In addition, this research contributes the largest dataset of hate speech in the field and introduces a word embedding model that transcends existing tech niques. These findings substantially improve the understanding and detection of hate speech on social media platforms, laying the groundwork for more effective mechanisms to combat hate speech and promote safer online communities en_US
dc.language.iso en en_US
dc.publisher Department of Computer Science and Engineering(CSE), Islamic University of Technology(IUT), Board Bazar, Gazipur-1704, Bangladesh en_US
dc.subject Hate Speech; Misogyny; FastText; Word Embedding; Se mantics; Sequential Pattern; Contextual Dependency; Bi-Directional Processing; Bi-LSTM; Dataset en_US
dc.title Misogyny Detection in Social Media for Under-Resourced Bangla Language en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search IUT Repository


Advanced Search

Browse

My Account

Statistics