Abstract:
In this thesis, we present our work regarding text summarization. Text summarization is the technique for
generating concise and precise summaries of voluminous texts while focusing on the sections that convey
useful information without losing the overall meaning. In this age of information, there are vast quantities of
textual data available. Example sources include online documents, articles, news, and user reviews of various
products and services. We can present the underlying information present in these texts concisely through
summaries. However, generating summaries for such a large source of text documents by hand is troublesome.
We can utilize neural machine summarization systems to generate summaries automatically. These
systems leverage the power of deep learning models. Recently, with the invention of Transformer architecture,
modern summarization systems have achieved revolutionary performance gains. Efficient transformer-based
summarization systems exist for English and other popular languages, but not Bangla. In this research, we
present an efficient transformer-based text summarization system for the Bangla language. We use subword
encoding to eliminate the problem of rare and unknown words. We have created a large dataset, consisting
of 600 thousand news articles, to train our model. We trained a 6 million parameter model that is capable
of producing accurate summaries. We evaluated out summaries by observing it’s generative performance.
Description:
Supervised by
Dr. Abu Raihan Mostofa Kamal, PhD
Professor
Department of Computer Science and Engineering (CSE)
Islamic University of Technology (IUT), OIC