dc.identifier.citation |
[1] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, “Show and tell: A neural image caption generator,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3156–3164. [2] K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “Bleu: a method for automatic evaluation of machine translation,” in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Philadelphia, Pennsylvania, USA: Association for Computational Linguistics, Jul. 2002, pp. 311–318. [Online]. Available: https://aclanthology.org/P02-1040 [3] T. Yao, Y. Pan, Y. Li, Z. Qiu, and T. Mei, “Boosting image captioning with at tributes,” in Proceedings of the IEEE international conference on computer vi sion, 2017, pp. 4894–4902. [4] B. A. Plummer, L. Wang, C. M. Cervantes, J. C. Caicedo, J. Hockenmaier, and S. Lazebnik, “Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 2641–2649. [5] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer, 2014, pp. 740–755. [6] S. Katiyar and S. K. Borgohain, “Comparative evaluation of cnn architectures for image caption generation,” arXiv preprint arXiv:2102.11506, 2021. [7] Z. Yang and N. Okazaki, “Image caption generation for news articles,” in Pro ceedings of the 28th International Conference on Computational Linguistics, 2020, pp. 1941–1951. [8] C. Chen, S. Mu, W. Xiao, Z. Ye, L. Wu, and Q. Ju, “Improving image captioning with conditional generative adversarial nets,” in Proceedings of the AAAI Confer ence on Artificial Intelligence, vol. 33, no. 01, 2019, pp. 8142–8150. 30 [9] H. R. Tavakoli, R. Shetty, A. Borji, and J. Laaksonen, “Paying attention to de scriptions generated by image captioning models,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2487–2496. [10] Y. Huang, B. Liu, J. Fu, and Y. Lu, “A picture is worth a thousand words: A unified system for diverse captions and rich images generation,” in Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 2792–2794. [11] M. Tanti, A. Gatt, and K. P. Camilleri, “What is the role of recurrent neural net works (rnns) in an image caption generator?” arXiv preprint arXiv:1708.02043, 2017. [12] S. Katiyar and S. K. Borgohain, “Analysis of convolutional decoder for image caption generation,” arXiv preprint arXiv:2103.04914, 2021. [13] R. Bernardi, R. Cakici, D. Elliott, A. Erdem, E. Erdem, N. Ikizler-Cinbis, F. Keller, A. Muscat, and B. Plank, “Automatic description generation from im ages: A survey of models, datasets, and evaluation measures,” Journal of Artifi cial Intelligence Research, vol. 55, pp. 409–442, 2016. [14] T. Miyazaki and N. Shimizu, “Cross-lingual image caption generation,” in Pro ceedings of the 54th Annual Meeting of the Association for Computational Lin guistics (Volume 1: Long Papers), 2016, pp. 1780–1790. [15] M. Nikolaus, M. Abdou, M. Lamm, R. Aralikatte, and D. Elliott, “Compositional generalization in image captioning,” arXiv preprint arXiv:1909.04402, 2019. [16] S. J. Rennie, E. Marcheret, Y. Mroueh, J. Ross, and V. Goel, “Self-critical se quence training for image captioning,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7008–7024. [17] Y. Yoshikawa, Y. Shigeto, and A. Takeuchi, “Stair captions: Constructing a large scale japanese image caption dataset,” arXiv preprint arXiv:1705.00823, 2017. [18] V. Jindal, “Generating image captions in arabic using root-word based recurrent neural networks and deep neural networks,” in Proceedings of the AAAI Confer ence on Artificial Intelligence, vol. 32, no. 1, 2018. [19] R. Biswas, M. Barz, M. Hartmann, and D. Sonntag, “Improving german image captions using machine translation and transfer learning,” in Statistical Language and Speech Processing: 9th International Conference, SLSP 2021, Cardiff, UK, November 23–25, 2021, Proceedings 9. Springer, 2021, pp. 3–14. [20] W. Zhao, B. Wang, J. Ye, M. Yang, Z. Zhao, R. Luo, and Y. Qiao, “A multi-task learning approach for image captioning.” in IJCAI, 2018, pp. 1205–1211. 31 [21] S. Kwon, B.-H. Go, and J.-H. Lee, “A text-based visual context modulation neu ral model for multimodal machine translation,” Pattern Recognition Letters, vol. 136, pp. 212–218, 2020. [22] D. Elliott, S. Frank, and E. Hasler, “Multilingual image description with neural sequence models,” arXiv preprint arXiv:1510.04709, 2015. [23] A. Mastropaolo, S. Scalabrino, N. Cooper, D. N. Palacio, D. Poshyvanyk, R. Oliveto, and G. Bavota, “Studying the usage of text-to-text transfer transformer to support code-related tasks,” in 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 2021, pp. 336–347. [24] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017. [25] Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey et al., “Google’s neural machine translation sys tem: Bridging the gap between human and machine translation,” arXiv preprint arXiv:1609.08144, 2016. [26] A. Farhadi, M. Hejrati, M. A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth, “Every picture tells a story: Generating sentences from images,” in Computer Vision–ECCV 2010: 11th European Conference on Computer Vi sion, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part IV 11. Springer, 2010, pp. 15–29. [27] A. Roberts, C. Raffel, K. Lee, M. Matena, N. Shazeer, P. J. Liu, S. Narang, W. Li, and Y. Zhou, “Exploring the limits of transfer learning with a unified text-to-text transformer,” 2019. [28] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning with a unified text-to text transformer,” The Journal of Machine Learning Research, vol. 21, no. 1, pp. 5485–5551, 2020. [29] B. Dzmitry and B. Yoshua, “Neural machine translation by jointly learning to align and translate,” in 3rd International Conference on Learning Representa tions, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Pro ceedings, 2 |
en_US |