A Machine Learning approach to Data Augmentation with Semantic Similarity on a Low-Resource Language

dc.contributor.author Islam, Shah Jawad
dc.contributor.author Chowdhury, Mohammad Abrar
dc.contributor.author Alam, Taufiqul
dc.date.accessioned 2024-08-28T09:58:56Z
dc.date.available 2024-08-28T09:58:56Z
dc.date.issued 2023-05-30
dc.identifier.uri http://hdl.handle.net/123456789/2136
dc.description Supervised by Dr. Hasan Mahmud, Associate Professor, Ms. Nafisa Sadaf, Lecturer, Dr. Md. Kamrul Hasan, Professor, Department of Computer Science and Engineering(CSE), Islamic University of Technology(IUT), Board Bazar, Gazipur-1704, Bangladesh. en_US
dc.description.abstract The augmentation of data in low-resource languages gained significant importance re cently, primarily because of scarcity of datasets or the presence of highly unbalanced datasets. In the case of the Bengali language, the detection of fake news has turned up as a relevant problem, particularly in light of the surge in false information related to Covid-19 and the pandemic [1]. However, there has been a lack of adequately balanced data sets specifically designed for training Machine Learning (ML) and Deep Learning (DL) models in the detection of fake news in Bengali. Furthermore, previous attempts at augmenting fake news texts have yielded satisfactory results in lexical analysis but unsatisfactory results in terms of semantic relevance. To address these challenges, we propose a framework that involves the use of Text Augmentation techniques with the assistance of the Bangla Text-to-Text Transfer Transformer (T5) model. This frame work aims to balance an unbalanced Bengali fake news dataset, while ensuring that the augmented text retains semantic similarity and structural accuracy. By employing this approach, we seek to strengthen the effectiveness and reliability of fake news detection models in the Bengali language. en_US
dc.language.iso en en_US
dc.publisher Department of Computer Science and Engineering(CSE), Islamic University of Technology(IUT), Board Bazar, Gazipur-1704, Bangladesh en_US
dc.subject Text Augmentation; Balanced Dataset; Lexical Analysis; Seman tic Relevance; Bangla T5 en_US
dc.title A Machine Learning approach to Data Augmentation with Semantic Similarity on a Low-Resource Language en_US
dc.type Thesis en_US

