Abstract:
Issue-tracking platforms such as Jira and Bugzilla have become essential in large-
scale software development. Prominent organizations in the Open Source Software
(OSS) landscape, such as Apache and Mozilla, make heavy use of these platforms and
document their software development process through online repositories that utilize
version control systems (VCS) like Git. Artifacts gathered from these sources contain
natural language data that can be used to answer important questions relating to the
nature of the software produced and the sentiment of the developers. The commit fre-
quency and working time of the developers can be correlated to the sentiment shown
through the commit messages. Moreover, the sentiment of issue comments might
differ significantly based on the type (i.e., bug or non-bug) or severity. In this regard,
we utilized a modern machine learning-based approach through fine-tuning seBERT,
a BERT model pre-trained on software development data, to classify sentiment and
provide answers to these questions. We used an existing data set, 20-MAD, to test
these hypotheses and provide the results. We found that high committer frequency
is associated with a higher proportion of negative sentiments compared to low and
medium frequencies, while the part of the day developers work in has minimal effect
on measured sentiment. We also observed that the severity of an issue significantly
influences the sentiment expressed in issue comments and issues classified as bugs
have a higher negative sentiment frequency compared to other issue types combined.
Description:
Supervised by
Mr. Shohel Ahmed,
Assistant Professor,
Department of Computer Science and Engineering (CSE)
Islamic University of Technology (IUT)
Board Bazar, Gazipur, Bangladesh
This thesis is submitted in partial fulfillment of the requirement for the degree of Bachelor of Science in Computer Science and Engineering, 2024