Analysis on LLMs Performance for Code Summarization

Ahsan, Salman; Mazumder, Md. Muktadir; Akib, Md. Ahnaf

dc.contributor.author	Ahsan, Salman
dc.contributor.author	Mazumder, Md. Muktadir
dc.contributor.author	Akib, Md. Ahnaf
dc.date.accessioned	2025-03-05T08:04:11Z
dc.date.available	2025-03-05T08:04:11Z
dc.date.issued	2024-07-08
dc.identifier.citation	[1] M. Abdin, S. A. Jacobs, A. A. Awan, et al., “Phi-3 technical report: A highly ca pable language model locally on your phone,” arXiv preprint arXiv:2404.14219, 2024. [2] O. Achiam, S. Adler, S. Agarwal, et al., “Gpt-4 technical report,(2023),” URL https://api. semanticscholar. org/CorpusID, vol. 257532815, [3] T. Ahmed and P. Devanbu, “Few-shot training llms for project-specific code summarization,” in 37th IEEE/ACM International Conference on Automated Soft ware Engineering (ASE ’22), ACM, Rochester, MI, USA, 2022, pp. 1–5. doi: 10. 1145/3551349.3559555. [4] AI@Meta, “Llama 3 model card,” 2024. [Online]. Available: https://github. com/meta-llama/llama3/blob/main/MODEL_CARD.md. [5] U. Alon, S. Brody, O. Levy, and E. Yahav, “Code2seq: Generating sequences from structured representations of code,” arXiv preprint arXiv:1808.01400, 2018. [6] A. V. M. Barone and R. Sennrich, “A parallel corpus of python functions and documentation strings for automated code documentation and code genera tion,” arXiv preprint arXiv:1707.02275, 2017. [7] J. Cheng, I. Fostiropoulos, and B. Boehm, “Gn-transformer: Fusing sequence and graph representation for improved code summarization,” arXiv preprint arXiv:2111.08874, 2021. [8] K. Cho, B. Van Merriënboer, C. Gulcehre, et al., “Learning phrase represen tations using rnn encoder-decoder for statistical machine translation,” arXiv preprint arXiv:1406.1078, 2014. [9] P. Fernandes, M. Allamanis, and M. Brockschmidt, “Structured neural summa rization,” arXiv preprint arXiv:1811.01824, 2018. [10] S. Gao, C. Gao, Y. He, J. Zeng, L. Y. Nie, and X. Xia, “Code structure guided transformer for source code summarization. corr abs/2104.09340 (2021),” arXiv preprint arXiv:2104.09340, 2021. 32 [11] Y. Gao and C. Lyu, “M2ts: Multi-scale multi-modal approach based on trans former for source code summarization,” in Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension, 2022, pp. 24–35. [12] D. Guo et al., “Graphcodebert: Pre-training code representations with data flow,” CoRR, vol. abs/2009.08366, 2024. [13] R. Haldar and J. Hockenmaier, “Analyzing the performance of large language models on code summarization,” University of Illinois Urbana-Champaign, 2024. [14] D. Hendrycks, S. Basart, S. Kadavath, et al., “Measuring coding challenge com petence with apps,” arXiv preprint arXiv:2105.09938, 2021. [15] X. Hu, G. Li, X. Xia, D. Lo, and Z. Jin, “Deep code comment generation,” in Proceedings of the 26th conference on program comprehension, 2018, pp. 200– 210. [16] X. Hu, G. Li, X. Xia, D. Lo, S. Lu, and Z. Jin, “Summarizing source code with transferred api knowledge,” 2018. [17] H. Husain, H.-H. Wu, T. Gazit, M. Allamanis, and M. Brockschmidt, “Code searchnet challenge: Evaluating the state of semantic code search,” arXiv preprint arXiv:1909.09436, 2019. [18] S. Iyer, I. Konstas, A. Cheung, and L. Zettlemoyer, “Summarizing source code using a neural attention model,” in 54th Annual Meeting of the Association for Computational Linguistics 2016, Association for Computational Linguistics, 2016, pp. 2073–2083. [19] A. Q. Jiang, A. Sablayrolles, A. Mensch, et al., “Mistral 7b,” arXiv preprint arXiv:2310.06825, 2023. [20] A. LeClair, S. Haque, L. Wu, and C. McMillan, “Improved code summarization via a graph neural network,” in Proceedings of the 28th international conference on program comprehension, 2020, pp. 184–195. [21] A. LeClair, S. Jiang, and C. McMillan, “A neural model for generating natural language summaries of program subroutines,” in 2019 IEEE/ACM 41st Interna tional Conference on Software Engineering (ICSE), IEEE, 2019, pp. 795–806. [22] S. Liu, Y. Chen, X. Xie, J. Siow, and Y. Liu, “Retrieval-augmented generation for code summarization via hybrid gnn,” arXiv preprint arXiv:2006.05405, 2020. [23] S. Lu, D. Guo, S. Ren, et al., “Codexglue: A machine learning benchmark dataset for code understanding and generation,” arXiv preprint arXiv:2102.04664, 2021. 33 [24] E. Nijkamp et al., “Codegen: An open large language model for code with multi turn program synthesis,” in ICLR, 2024. [25] B. Roziere, J. Gehring, F. Gloeckle, et al., “Code llama: Open foundation models for code,” arXiv preprint arXiv:2308.12950, 2023. [26] Y. Shido, Y. Kobayashi, A. Yamamoto, A. Miyamoto, and T. Matsumura, “Au tomatic source code summarization with extended tree-lstm,” in 2019 Interna tional Joint Conference on Neural Networks (IJCNN), IEEE, 2019, pp. 1–8. [27] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neu ral networks,” Advances in neural information processing systems, vol. 27, 2014. [28] Z. Tang, X. Shen, C. Li, et al., “Ast-trans: Code summarization with efficient tree-structured attention,” in Proceedings of the 44th International Conference on Software Engineering, 2022, pp. 150–162. [29] G. Team, R. Anil, S. Borgeaud, et al., “Gemini: A family of highly capable mul timodal models,” arXiv preprint arXiv:2312.11805, 2023. [30] G. Team, T. Mesnard, C. Hardin, et al., “Gemma: Open models based on gemini research and technology,” arXiv preprint arXiv:2403.08295, 2024. [31] A. Vaswani, N. Shazeer, N. Parmar, et al., “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017. [32] Y. Wan, Z. Zhao, M. Yang, et al., “Improving automatic source code summa rization via deep reinforcement learning,” in Proceedings of the 33rd ACM/IEEE international conference on automated software engineering, 2018, pp. 397–407. [33] C. Zhang, J. Wang, Q. Zhou, et al., “A survey of automatic source code summa rization,” Symmetry, vol. 14, no. 3, p. 471, 2022. [34] J. Zhang, X. Wang, H. Zhang, H. Sun, and X. Liu, “Retrieval-based neural source code summarization,” in Proceedings of the ACM/IEEE 42nd International Con ference on Software Engineering, 2020, pp. 1385–1397	en_US
dc.identifier.uri	http://hdl.handle.net/123456789/2356
dc.description	Supervised by Lutfun Nahar Lota, Assistant Professor, Co Supervisor, Mr. Md Farhan Ishmam, Lecturer, Department of Computer Science and Engineering (CSE) Islamic University of Technology (IUT) Board Bazar, Gazipur, Bangladesh This thesis is submitted in partial fulfillment of the requirement for the degree of Bachelor of Science in Computer Science and Engineering, 2024	en_US
dc.description.abstract	The goal of code summarizing is to produce concise source code descriptions in natural language. Deep learning has been used more and more recently in software engineering, particularly for tasks like code creation and summarization. Specifically, it appears that the most current Large Language Models with coding perform well on these tasks. Code summarization has evolved tremendously with the advent of Large Language Models (LLMs), providing sophisticated methods for generating concise and accurate summaries of source code. Our study aims to perform a comparative analysis of several open-source LLMs, namely LLaMA-3, Phi-3, Mistral, and Gemma. These models’ performance is assessed using important metrics such as BLEU3.1 and ROUGE3.2 . Through this analysis, we seek to identify the strengths and weaknesses of each model, offering insights into their applicability and effectiveness in code summarization tasks. Our findings contribute to the ongoing development and refinement of LLMs, supporting their integration into tools that enhance software development and maintenance processes.	en_US
dc.language.iso	en	en_US
dc.publisher	Department of Computer Science and Engineering(CSE), Islamic University of Technology(IUT), Board Bazar, Gazipur-1704, Bangladesh	en_US
dc.subject	Code Summarization, Large Language Models, Code Explanation, Per- formance Metrics, Natural Language Generation, Deep Learning	en_US
dc.title	Analysis on LLMs Performance for Code Summarization	en_US
dc.type	Thesis	en_US