Bangla Dataset Generation for Natural Language Inference

Islam, Md. Shohidul; Khan, Abdun Nayeem; Nizami, Md Shaidur Rahman

dc.contributor.author	Islam, Md. Shohidul
dc.contributor.author	Khan, Abdun Nayeem
dc.contributor.author	Nizami, Md Shaidur Rahman
dc.date.accessioned	2024-09-02T05:46:02Z
dc.date.available	2024-09-02T05:46:02Z
dc.date.issued	2023-05-30
dc.identifier.uri	http://hdl.handle.net/123456789/2147
dc.description	Supervised by Dr. Hasan mahmud, Associate Professor, Prof. Dr. Kamrul Hasan, Department of Computer Science and Engineering(CSE), Islamic University of Technology(IUT), Board Bazar, Gazipur-1704, Bangladesh Board Bazar, Gazipur, Bangladesh	en_US
dc.description.abstract	Understanding entailment and contradiction is fundamental to understanding nat ural language, and inference about entailment and contradiction is a valuable test ing ground for the development of semantic representations. However, machine learning research in this area has been dramatically limited by the lack of resources in Bangla. To address this, we propose to introduce our own corpus curated for natural language inference which is labeled pairs of sentences with a label that depicts their inner entailment. Our goal is to create a dataset that has over 30K instances and to do so we have now created a Bangla dataset by machine trans lating the SNLI corpus into Bangla. After that, we show that benchmark models can be used to evaluate and do the task of inference in Bangla . We hope that our dataset will catalyze research in Bangla sentence understanding by providing an informative standard evaluation task.For this we provided two baseline models which are both considered integral in the task of inference in any langauge.	en_US
dc.language.iso	en	en_US
dc.publisher	Department of Computer Science and Engineering(CSE), Islamic University of Technology(IUT), Board Bazar, Gazipur-1704, Bangladesh	en_US
dc.subject	entailment, contradiction, neutral, natural language, inference, seman tic representations, machine learning, Bangla, corpus, labeled pairs of sentences, inner entailment, dataset, instances, SNLI corpus, machine translation, benchmark models, evaluation task, baseline models, sentence understanding	en_US
dc.title	Bangla Dataset Generation for Natural Language Inference	en_US
dc.type	Thesis	en_US