Consistency of Comments to Source Code: An Empirical Investigation and A Dataset

Islam, Maksuda; Haque, Ahsanul; Hossen, Md Safayat

dc.contributor.author	Islam, Maksuda
dc.contributor.author	Haque, Ahsanul
dc.contributor.author	Hossen, Md Safayat
dc.date.accessioned	2023-03-16T08:24:58Z
dc.date.available	2023-03-16T08:24:58Z
dc.date.issued	2022-05-30
dc.identifier.citation	[1] A. Corazza, V. Maggio, and G. Scanniello, “Coherence of comments and method implementations: a dataset and an empirical investigation,” Software Quality Journal, vol. 26, no. 2, pp. 751–777, 2018. [2] F. Rabbi, M. N. Haque, M. E. Kadir, M. S. Siddik, and A. Kabir, “An ensemble approach to detect code comment inconsistencies using topic modeling.,” in SEKE, pp. 392–395, 2020. [3] T. Tenny, “Program readability: Procedures versus comments,” IEEE Transactions on Software Engineering, vol. 14, no. 9, pp. 1271–1279, 1988. [4] S. N. Woodfield, H. E. Dunsmore, and V. Y. Shen, “The effect of modularization and comments on program comprehension,” in Proceedings of the 5th international conference on Software engineering, pp. 215–223, 1981. [5] B. Fluri, M. Wursch, and H. C. Gall, “Do code and comments co-evolve? on the relation between source code and comment changes,” in 14th Working Conference on Reverse Engineering (WCRE 2007), pp. 70–79, IEEE, 2007. [6] D. Lawrie, D. Binkley, and C. Morrell, “Normalizing source code vocabulary,” in 2010 17th Working Conference on Reverse Engineering, pp. 3–12, IEEE, 2010. [7] R. Baeza-Yates and P. Raghavan, “Next generation web search,” in Search Computing, pp. 11–23, Springer, 2010. 27 REFERENCES 28 [8] N. Stulova, A. Blasi, A. Gorla, and O. Nierstrasz, “Towards detecting inconsistent comments in java source code automatically,” in 2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM), pp. 65–69, IEEE, 2020. [9] K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “Bleu: a method for automatic evaluation of machine translation,” in Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp. 311–318, 2002. [10] C. Callison-Burch, M. Osborne, and P. Koehn, “Re-evaluating the role of bleu in machine translation research,” in 11th conference of the european chapter of the association for computational linguistics, pp. 249–256, 2006. [11] X. Hu, G. Li, X. Xia, D. Lo, S. Lu, and Z. Jin, “Summarizing source code with transferred api knowledge,” 2018. [12] S. Iyer, I. Konstas, A. Cheung, and L. Zettlemoyer, “Summarizing source code using a neural attention model,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2073–2083, 2016. [13] F. Salviulo and G. Scanniello, “Dealing with identifiers and comments in source code comprehension and maintenance: Results from an ethnographically-informed study with students and professionals,” in Proceedings of the 18th international conference on evaluation and assessment in software engineering, pp. 1–10, 2014. [14] L. Tan, D. Yuan, G. Krishna, and Y. Zhou, “/* icomment: Bugs or bad comments?*,” in Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles, pp. 145–158, 2007. [15] S. H. Tan, D. Marinov, L. Tan, and G. T. Leavens, “@ tcomment: Testing javadoc comments to detect comment-code inconsistencies,” in 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation, pp. 260–269, IEEE, 2012. [16] W. M. Ibrahim, N. Bettenburg, B. Adams, and A. E. Hassan, “On the relationship between comment update practices and software bugs,” Journal of Systems and Software, vol. 85, no. 10, pp. 2293–2304, 2012. REFERENCES 29 [17] S. C. B. de Souza, N. Anquetil, and K. M. de Oliveira, “A study of the documentation essential to software maintenance,” in Proceedings of the 23rd annual international conference on Design of communication: documenting & designing for pervasive information, pp. 68–75, 2005. [18] N. Khamis, R. Witte, and J. Rilling, “Automatic quality assessment of source code comments: the javadocminer,” in International Conference on Application of Natural Language to Information Systems, pp. 68–79, Springer, 2010. [19] Z. Liu, H. Chen, X. Chen, X. Luo, and F. Zhou, “Automatic detection of outdated comments during code changes,” in 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 154–163, IEEE, 2018. [20] P. Rani, S. Panichella, M. Leuenberger, M. Ghafari, and O. Nierstrasz, “What do class comments tell us? an investigation of comment evolution and practices in pharo smalltalk,” Empirical Software Engineering, vol. 26, no. 6, pp. 1–49, 2021. [21] D. Lucia et al., “Information retrieval models for recovering traceability links between code and documentation,” in Proceedings 2000 International Conference on Software Maintenance, pp. 40–49, IEEE, 2000. [22] C. C. Silva, M. Galster, and F. Gilson, “Topic modeling in software engineering research,” Empirical Software Engineering, vol. 26, no. 6, pp. 1–62, 2021. [23] M. Iammarino, L. Aversano, M. L. Bernardi, and M. Cimitile, “A topic modeling approach to evaluate the comments consistency to source code,” in 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8, IEEE, 2020. [24] J. Moore, B. Gelman, and D. Slater, “A convolutional neural network for languageagnostic source code summarization,” arXiv preprint arXiv:1904.00805, 2019.	en_US
dc.identifier.uri	http://hdl.handle.net/123456789/1776
dc.description	Supervised by Ms. Lutfun Nahar Lota, Asst. Professor, Department of Computer Science and Engineering(CSE), Islamic University of Technology (IUT) Board Bazar, Gazipur-1704, Bangladesh. This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2022.	en_US
dc.description.abstract	Comments in the code are a primary source for system documentation. These are indispensable for the work of software maintainers as a foundation for code traceability, maintenance activities, and the use of the code itself as a library or framework in other projects. However, the quality of the comment has been overlooked for various reasons. Also, comments are doubtful to change with the evolution of the source code. The source code gets updated whenever the changes occur, but the comments are ignored. It leaves a new developer even more confused. So the coherence between the comments and the source code must be ensured and maintained. This paper aims to provide a dataset consisting of code-comment pairs through our research work. We have annotated 9,311 classes and methods of different C\# projects. 4,953 code comment pairs were taken after removing NULL, constructor, and variable. We employed a metric called Bilingual Evaluation Understudy (BLEU) to validate our human-curated dataset. This paper also includes a comparative analysis and discussion between the human-curated annotation and annotation provided by the BLEU score. A modified model from a previous study is also proposed, which obtained an accuracy of 96.56\% using the performance metric AUC-ROC after fitting the model to our annotated 4,953 code-comment pairs. In contrast, the previous model gave 93\% accuracy using a similar performance metric on this same dataset.	en_US
dc.language.iso	en	en_US
dc.publisher	Department of Computer Science and Engineering(CSE), Islamic University of Technology(IUT), Board Bazar, Gazipur, Bangladesh	en_US
dc.subject	Coherence, Bilingual Evaluation Understudy, Code-Comment pair	en_US
dc.title	Consistency of Comments to Source Code: An Empirical Investigation and A Dataset	en_US
dc.type	Thesis	en_US