Deep Learning Approach: Image Captioning in French and Arabic Language

Keita, Abdoulaye
Hamadou, Mohaman Dairou
Asag, Mazen Abdulwahab Mahyoub Salem
2023-05-30
dc.description Supervised by Mr. Md. Hamjajul Ashmafee, Assistant Professor, Department of Computer Science and Engineering(CSE), Islamic University of Technology(IUT), Board Bazar, Gazipur-1704, Bangladesh en_US
dc.description.abstract This research report introduces a novel dataset of French captions translated from the Flickr30k dataset using different translation models, namely we have Google Trans late and the powerful Transformers: T5 Small and T5 base models. A novel dataset of French captions means creating fresh data collection by translating existing captions from the Flickr30k dataset into French. The Flickr30k dataset is valuable for training and evaluating image captioning models in French. The main objective is to address the problem of generating precise image captions in French. The performance of an image captioning model is evaluated on the translated datasets, employing ResNet-50 for image feature encoding and LSTM network with attention in generating captions. These results demonstrate that the accuracy of image captions varies depending on the translation(or Language) models, with the Trans formers models outperforming Google Translate. The proposed approach achieves state-of-the-art performance in generating accurate French captions when combined with ResNet-50 and LSTM network with attention. The findings contribute to the field of image captioning and machine translation for French speakers, highlighting the importance of using advanced translation models for improved caption accuracy and other NLP tasks in French. Furthermore, this research provides insights into the potential of smaller-scale models in limited data scenarios. Based on our findings, we can explore alternative translation models, and data aug mentation techniques, and consider multi-modal approaches that could lead to more accurate and contextually relevant captions and the potential of this approach in other languages en_US
dc.language.iso en en_US
dc.publisher Department of Computer Science and Engineering(CSE), Islamic University of Technology(IUT), Board Bazar, Gazipur-1704, Bangladesh en_US
dc.subject Novel dataset, Translation models, Transformers, Image captioning, Natural Language Processing, Multimodal technologies en_US
dc.title Deep Learning Approach: Image Captioning in French and Arabic Language en_US
dc.type Thesis en_US

