Abstract:
Information visualization such as bar- and line-charts are quite popular for understanding
large tabular data. But, interpreting information solely with different visualization
techniques can also be difficult due to different reasons like visual impairment or the
requirement of prior domain knowledge to understand the chart. Automatic "chart to
text summarization" can be promising and effective tool for providing accessibility as
well as precised insights of chart data in natural language. In spite of having a good
potential, there have not been a lot of works on chart to text summarization making
it a low resource task. Scarcity of large scale datasets for chart to text summarization
is one of the reason behind this. The human written descriptions in the available
dataset also contains information beyond the knowledge of the chart making it difficult
for us to have an unbiased evaluation. In our thesis, we propose ChartSumm a
large scale dataset for chart to text summarization consisting of 84,363 charts along
with their metadata and descriptions. We also propose two test sets: test-e and test-h
for evaluating the performance of the trained models available in this domain. Our
experiment shows that a T5 model trained on our dataset has achieved BLEU score of
75.72 in test-e set and 64.78 in test-h set. From our analysis we can conclude that large
language models like T5 and BART can generate short precised deception from given
chart metadata.
Description:
Supervised by
Mr. Md. Hamjajul Ashmafee,
Lecturer,
Department of Computer Science and Engineering(CSE),
Islamic University of Technology (IUT)
Board Bazar, Gazipur-1704, Bangladesh.
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2022.