Exploratory Analysis of Developer Sentiment On Open Source Projects

Siam, Md. Kawsar Ahamed; Rahman, Mahmudur; Hassan, Moudud

dc.contributor.author	Siam, Md. Kawsar Ahamed
dc.contributor.author	Rahman, Mahmudur
dc.contributor.author	Hassan, Moudud
dc.date.accessioned	2025-03-06T05:34:37Z
dc.date.available	2025-03-06T05:34:37Z
dc.date.issued	2024-09-17
dc.identifier.citation	[1] T. Ahmed, A. Bosu, A. Iqbal, and S. Rahimi, “Senticr: A customized sentiment analysis tool for code review interactions,” in Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, ser. ASE ’17, Urbana Champaign, IL, USA: IEEE Press, 2017, pp. 106–111, isbn: 9781538626849. [2] I. E. Asri, N. Kerzazi, G. Uddin, F. Khomh, and M. Janati Idrissi, “An empiri cal study of sentiments in code reviews,” Information and Software Technology, vol. 114, pp. 37–54, 2019, issn: 0950-5849. doi: https://doi.org/10.1016/ j.infsof.2019.06.005. [3] E. Biswas, M. Karabulut, L. Pollock, and K. Vijay-Shanker, “Achieving reliable sentiment analysis in the software engineering domain using bert,” in 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME), Los Alamitos, CA, USA: IEEE Computer Society, Oct. 2020, pp. 162–173. doi: 10.1109/ICSME46990.2020.00025. [4] F. Calefato, F. Lanubile, F. Maiorano, and N. Novielli, “Sentiment polarity de tection for software development,” Empirical Softw. Engg., vol. 23, no. 3, pp. 1352– 1382, Jun. 2018, issn: 1382-3256. doi: 10.1007/s10664-017-9546-9. [5] D. Che, M. Safran, and Z. Peng, “From big data to big data mining: Challenges, issues, and opportunities,” in Database Systems for Advanced Applications: 18th International Conference, DASFAA 2013, InternationalWorkshops: BDMA, SNSM, SeCoP, Wuhan, China, April 22-25, 2013. Proceedings 18, Springer, 2013, pp. 1– 15. [6] M. Claes and M. V. Mantyla, “20-mad - 20 years of issues and commits of mozilla and apache development,” in 2020 IEEE/ACM 17th International Conference on Mining Software Repositories (MSR), Los Alamitos, CA, USA: IEEE Computer Society, May 2020, pp. 503–507. doi: 10.1145/3379597.3387487. [7] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018. 35 [8] R. Dhakad and L. Benedicenti, “Analyzing emotional contagion in commit mes sages of open-source software repositories,” Natural Language Processing and Machine Learning, 2023. [9] A. Di Sorbo and S. Panichella, “Summary of the 1st natural language-based soft ware engineering workshop (nlbse 2022),”ACM SIGSOFT Software Engineering Notes, vol. 48, no. 1, pp. 101–104, 2023. [10] P. Ekman, “An argument for basic emotions,” Cognition & emotion, vol. 6, no. 3- 4, pp. 169–200, 1992. [11] D. Graziotin, F. Fagerholm, X. Wang, and P. Abrahamsson, “What happens when software developers are (un)happy,” Journal of Systems and Software, vol. 140, pp. 32–47, 2018, issn: 0164-1212. doi: https://doi.org/10.1016/ j.jss.2018.02.041. [12] E. Guzman, D. Azócar, and Y. Li, “Sentiment analysis of commit comments in github: An empirical study,” 11th Working Conference on Mining Software Repositories, MSR 2014 - Proceedings, May 2014. doi: 10.1145/2597073.2597118. [13] Hugging face pipelines, https://huggingface.co/docs/transformers/en/ main_classes/pipelines. [14] S. Huq, A. Sadiq, and K. Sakib, “Understanding the effect of developer senti ment on fix-inducing changes: An exploratory study on github pull requests,” in 2019 26th Asia-Pacific Software Engineering Conference (APSEC), Los Alami tos, CA, USA: IEEE Computer Society, Dec. 2019, pp. 514–521. doi: 10.1109/ APSEC48747.2019.00075. [15] S. F. Huq, A. Sadiq, and K. Sakib, “Is developer sentiment related to software bugs: An exploratory study on github commits,” Feb. 2020, pp. 527–531. doi: 10.1109/SANER48275.2020.9054801. [16] M. R. Islam and M. F. Zibran, “Sentiment analysis of software bug related com mit messages,” Network, vol. 740, p. 740, 2018. [17] M. R. Islam and M. F. Zibran, “Exploration and exploitation of developers’ sen timental variations in software engineering,” vol. 4, no. 4, pp. 35–55, Oct. 2016, issn: 2166-7160. doi: 10.4018/IJSI.2016100103. [18] M. R. Islam and M. F. Zibran, “Sentistrength-se: Exploiting domain specificity for improved sentiment analysis in software engineering text,” Journal of Sys tems and Software, vol. 145, pp. 125–146, 2018, issn: 0164-1212. doi: https: //doi.org/10.1016/j.jss.2018.08.030. 36 [19] E. Kalliamvakou, G. Gousios, K. Blincoe, L. Singer, D. M. German, and D. Damian, “The promises and perils of mining github,” in Proceedings of the 11th Working Conference on Mining Software Repositories, ser. MSR 2014, Hyderabad, India: Association for Computing Machinery, 2014, pp. 92–101, isbn: 9781450328630. doi: 10.1145/2597073.2597074. [20] R. Kallis, O. Chaparro, A. D. Sorbo, and S. Panichella, “Nlbse’22 tool compe tition,” in 2022 IEEE/ACM 1st International Workshop on Natural Language Based Software Engineering (NLBSE), Los Alamitos, CA, USA: IEEE Computer Society, May 2022, pp. 25–28. doi: 10.1145/3528588.3528664. [21] Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut, “Albert: A lite bert for self-supervised learning of language representations,” arXiv preprint arXiv:1909.11942, 2019. [22] B. Lin, F. Zampetti, G. Bavota, M. Di Penta, M. Lanza, and R. Oliveto, “Senti ment analysis for software engineering: How far can we go?” In Proceedings of the 40th International Conference on Software Engineering, ser. ICSE ’18, <conf loc>, <city>Gothenburg</city>, <country>Sweden</country>, </conf-loc>: Association for Computing Machinery, 2018, pp. 94–104, isbn: 9781450356381. doi: 10.1145/3180155.3180195. [23] Y. Liu, M. Ott, N. Goyal, et al., “Roberta: A robustly optimized bert pretraining approach,” arXiv preprint arXiv:1907.11692, 2019. [24] I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017. [25] L. Marks, Y. Zou, and A. E. Hassan, “Studying the fix-time for bugs in large open source projects,” in Proceedings of the 7th International Conference on Predictive Models in Software Engineering, ser. Promise ’11, Banff, Alberta, Canada: Asso ciation for Computing Machinery, 2011, isbn: 9781450307093. doi: 10.1145/ 2020390.2020401. [26] W. Medhat, A. Hassan, and H. Korashy, “Sentiment analysis algorithms and applications: A survey,” Ain Shams Engineering Journal, vol. 5, no. 4, pp. 1093– 1113, 2014, issn: 2090-4479. doi: https://doi.org/10.1016/j.asej.2014. 04.011. [27] T. Mikolov, A. Deoras, D. Povey, L. Burget, and J. Černocky, “Strategies for train- ` ing large scale neural network language models,” in 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, IEEE, 2011, pp. 196–201. [28] A. Mitra and S. Mohanty, “Sentiment analysis using machine learning approaches,” Journal of Ubiquitous Computing and Communication Technologies (UCCT), vol. 2, no. 03, pp. 145–152, 2020. 37 [29] J. von der Mosel, A. Trautsch, and S. Herbold, “On the validity of pre-trained transformers for natural language processing in the software engineering do main,” IEEE Transactions on Software Engineering, vol. 49, no. 04, pp. 1487– 1507, Apr. 2023, issn: 1939-3520. doi: 10.1109/TSE.2022.3178469. [30] A. Murgia, P. Tourani, B. Adams, and M. Ortu, “Do developers feel emotions? an exploratory analysis of emotions in software artifacts,” in Proceedings of the 11th Working Conference on Mining Software Repositories, ser. MSR 2014, Hy derabad, India: Association for Computing Machinery, 2014, pp. 262–271, isbn: 9781450328630. doi: 10.1145/2597073.2597086. [31] N. Novielli, F. Calefato, D. Dongiovanni, D. Girardi, and F. Lanubile, “Can we use se-specific sentiment analysis tools in a cross-platform setting?” In 2020 IEEE/ACM 17th International Conference on Mining Software Repositories (MSR), Los Alamitos, CA, USA: IEEE Computer Society, May 2020, pp. 158–168. doi: 10.1145/3379597.3387446. [32] N. Novielli, F. Calefato, and F. Lanubile, “The challenges of sentiment detection in the social programmer ecosystem,” in Proceedings of the 7th International Workshop on Social Software Engineering, ser. SSE 2015, Bergamo, Italy: Asso ciation for Computing Machinery, 2015, pp. 33–40, isbn: 9781450338189. doi: 10.1145/2804381.2804387. [33] M. Obaidi, L. Nagel, A. Specht, and J. Klünder, “Sentiment analysis tools in soft ware engineering: A systematic mapping study,” Inf. Softw. Technol., vol. 151, no. C, Nov. 2022, issn: 0950-5849. doi: 10.1016/j.infsof.2022.107018. [34] M. Ortu, B. Adams, G. Destefanis, P. Tourani, M. Marchesi, and R. Tonelli, “Are bullies more productive? empirical study of affectiveness vs. issue fixing time,” in 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, 2015, pp. 303–313. doi: 10.1109/MSR.2015.35. [35] S. Panichella and A. Di Sorbo, “Summary of the 2nd natural language-based software engineering workshop (nlbse 2023),” ACM SIGSOFT Software Engi neering Notes, vol. 48, no. 4, pp. 60–63, 2023. [36] J. A. Russell, “A circumplex model of affect.,” Journal of personality and social psychology, vol. 39, no. 6, p. 1161, 1980. [37] M. U. Sarwar, S. Zafar, M. W. Mkaouer, G. S. Walia, and M. Z. Malik, “Multi label classification of commit messages using transfer learning,” in 2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISS REW), 2020, pp. 37–42. doi: 10.1109/ISSREW51248.2020.00034. [38] C. Sun, X. Qiu, Y. Xu, and X. Huang, “How to fine-tune bert for text classi fication?” In Chinese computational linguistics: 18th China national conference, 38 CCL 2019, Kunming, China, October 18–20, 2019, proceedings 18, Springer, 2019, pp. 194–206. [39] A. Trautsch and S. Herbold, “Predicting issue types with sebert,” in Proceedings of the 1st International Workshop on Natural Language-Based Software Engi neering, ser. NLBSE ’22, Pittsburgh, Pennsylvania: Association for Computing Machinery, 2023, pp. 37–39, isbn: 9781450393430. doi: 10 . 1145 / 3528588 . 3528661. [40] G. Uddin and F. Khomh, “Automatic mining of opinions expressed about apis in stack overflow,” IEEE Transactions on Software Engineering, vol. 47, no. 03, pp. 522–559, Mar. 2021, issn: 1939-3520. doi: 10.1109/TSE.2019.2900245. [41] A. Valdez, H. Oktaba, H. Gómez, and A. Vizcaíno, “Sentiment analysis in jira software repositories,” in 2020 8th International Conference in Software Engi neering Research and Innovation (CONISOFT), 2020, pp. 254–259. doi: 10.1109/ CONISOFT50191.2020.00043. [42] C. Wang, Y. Li, L. Chen, W. Huang, Y. Zhou, and B. Xu, “Examining the effects of developer familiarity on bug fixing,” Journal of Systems and Software, vol. 169, p. 110 667, 2020, issn: 0164-1212. doi: https://doi.org/10.1016/j.jss. 2020.110667. [43] A. Yadav, S. K. Singh, and J. S. Suri, “Ranking of software developers based on expertise score for bug triaging,” Inf. Softw. Technol., vol. 112, no. C, pp. 1–17, Aug. 2019, issn: 0950-5849. doi: 10.1016/j.infsof.2019.03.014. [44] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhutdinov, and Q. V. Le, “Xlnet: Generalized autoregressive pretraining for language understanding,” in Pro ceedings of the 33rd International Conference on Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates Inc., 2019. [45] T. Zhang, I. C. Irsan, F. Thung, and D. Lo, “Revisiting sentiment analysis for software engineering in the era of large language models,” arXiv preprint arXiv:2310.11113, 2023. [46] T. Zhang, B. Xu, F. Thung, S. A. Haryono, D. Lo, and L. Jiang, “Sentiment anal ysis for software engineering: How far can pre-trained transformer models go?” In 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME), Sep. 2020, pp. 70–80. doi: 10.1109/ICSME46990.2020.00017	en_US
dc.identifier.uri	http://hdl.handle.net/123456789/2358
dc.description	Supervised by Mr. Shohel Ahmed, Assistant Professor, Department of Computer Science and Engineering (CSE) Islamic University of Technology (IUT) Board Bazar, Gazipur, Bangladesh This thesis is submitted in partial fulfillment of the requirement for the degree of Bachelor of Science in Computer Science and Engineering, 2024	en_US
dc.description.abstract	Issue-tracking platforms such as Jira and Bugzilla have become essential in large- scale software development. Prominent organizations in the Open Source Software (OSS) landscape, such as Apache and Mozilla, make heavy use of these platforms and document their software development process through online repositories that utilize version control systems (VCS) like Git. Artifacts gathered from these sources contain natural language data that can be used to answer important questions relating to the nature of the software produced and the sentiment of the developers. The commit fre- quency and working time of the developers can be correlated to the sentiment shown through the commit messages. Moreover, the sentiment of issue comments might differ significantly based on the type (i.e., bug or non-bug) or severity. In this regard, we utilized a modern machine learning-based approach through fine-tuning seBERT, a BERT model pre-trained on software development data, to classify sentiment and provide answers to these questions. We used an existing data set, 20-MAD, to test these hypotheses and provide the results. We found that high committer frequency is associated with a higher proportion of negative sentiments compared to low and medium frequencies, while the part of the day developers work in has minimal effect on measured sentiment. We also observed that the severity of an issue significantly influences the sentiment expressed in issue comments and issues classified as bugs have a higher negative sentiment frequency compared to other issue types combined.	en_US
dc.language.iso	en	en_US
dc.publisher	Department of Computer Science and Engineering(CSE), Islamic University of Technology(IUT), Board Bazar, Gazipur-1704, Bangladesh	en_US
dc.subject	Sentiment Analysis, Commit, Open Source, VCS, Software Development	en_US
dc.title	Exploratory Analysis of Developer Sentiment On Open Source Projects	en_US
dc.type	Thesis	en_US