Abstract:
This research report introduces a novel dataset of French captions translated from the
Flickr30k dataset using different translation models, namely we have Google Trans late and the powerful Transformers: T5 Small and T5 base models. A novel dataset of
French captions means creating fresh data collection by translating existing captions
from the Flickr30k dataset into French. The Flickr30k dataset is valuable for training
and evaluating image captioning models in French.
The main objective is to address the problem of generating precise image captions in
French. The performance of an image captioning model is evaluated on the translated
datasets, employing ResNet-50 for image feature encoding and LSTM network with
attention in generating captions. These results demonstrate that the accuracy of image
captions varies depending on the translation(or Language) models, with the Trans formers models outperforming Google Translate. The proposed approach achieves
state-of-the-art performance in generating accurate French captions when combined
with ResNet-50 and LSTM network with attention.
The findings contribute to the field of image captioning and machine translation for
French speakers, highlighting the importance of using advanced translation models for
improved caption accuracy and other NLP tasks in French. Furthermore, this research
provides insights into the potential of smaller-scale models in limited data scenarios.
Based on our findings, we can explore alternative translation models, and data aug mentation techniques, and consider multi-modal approaches that could lead to more
accurate and contextually relevant captions and the potential of this approach in other
languages
Description:
Supervised by
Mr. Md. Hamjajul Ashmafee,
Assistant Professor,
Department of Computer Science and Engineering(CSE),
Islamic University of Technology(IUT),
Board Bazar, Gazipur-1704, Bangladesh