Abstract:
This thesis addresses the challenge of long document summarization within the do main of Natural Language Processing (NLP), with a primary focus on enhancing fac tual consistency and text coherence while effectively managing extensive input con texts. To achieve these objectives, we propose a novel hybrid methodology that inte grates both extractive and abstractive summarization techniques. The methodology
commences with an extractive phase, where a BART model is fine-tuned and aug mented with K-Nearest Neighbors (KNN) indexing, substantially increasing the con text length of the input and facilitating the retention of more comprehensive informa tion from the source document. Following the extractive phase, the abstractive phase
leverages a pretrained BART model, coupled with contrastive learning, generating
more coherent and factually accurate summaries. This two-stage approach ensures
that the initial extraction captures essential information, which is then refined and
articulated in the abstractive phase. Our experimental results demonstrate the suc cessful implementation of this methodology, with significant improvements observed
in factual consistency and text coherence, as validated by higher BERTScore metrics.
Despite the promising outcomes, we acknowledge that further human evaluation is
necessary to fully validate our findings, which remains beyond the current research
scope. Nonetheless, our research signifies a major advancement in long document
summarization, presenting a strong framework that merges the benefits of extractive
and abstractive techniques to generate high-quality summaries. This hybrid approach
not only overcomes the shortcomings of individual methods but also paves the way
for future progress in NLP-based summarization.
Description:
Supervised by
Mr. Ishmam Tashdeed,
Lecturer,
Department of Computer Science and Engineering (CSE)
Islamic University of Technology (IUT)
Board Bazar, Gazipur, Bangladesh
This thesis is submitted in partial fulfillment of the requirement for the degree of Bachelor of Science in Computer Science and Engineering, 2024