Abstract:
Comments in the code are a primary source for system documentation. These are indispensable for the work of software maintainers as a foundation for code traceability, maintenance activities, and the use of the code itself as a library or framework in other projects. However, the quality of the comment has been overlooked for various reasons. Also, comments are doubtful to change with the evolution of the source code. The source code gets updated whenever the changes occur, but the comments are ignored. It leaves a new developer even more confused. So the coherence between the comments and the source code must be ensured and maintained. This paper aims to provide a dataset consisting of code-comment pairs through our research work. We have annotated 9,311 classes and methods of different C\# projects. 4,953 code comment pairs were taken after removing NULL, constructor, and variable. We employed a metric called Bilingual Evaluation Understudy (BLEU) to validate our human-curated dataset. This paper also includes a comparative analysis and discussion between the human-curated annotation and annotation provided by the BLEU score. A modified model from a previous study is also proposed, which obtained an accuracy of 96.56\% using the performance metric AUC-ROC after fitting the model to our annotated 4,953 code-comment pairs. In contrast, the previous model gave 93\% accuracy using a similar performance metric on this same dataset.
Description:
Supervised by
Ms. Lutfun Nahar Lota,
Asst. Professor,
Department of Computer Science and Engineering(CSE),
Islamic University of Technology (IUT)
Board Bazar, Gazipur-1704, Bangladesh.
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2022.