Classification of Stack Overflow Questions Based on Difficulty

Show simple item record

dc.contributor.author Raida, Maliha Noushin
dc.contributor.author Sristy, Zannatun Naim
dc.contributor.author Monisha, Sheikh Moonwara Anjum
dc.contributor.author Ulfat, Nawshin
dc.date.accessioned 2023-03-23T09:48:12Z
dc.date.available 2023-03-23T09:48:12Z
dc.date.issued 2022-05-30
dc.identifier.citation [1] S. Wang, T.-H. P. Chen, and A. Hassan, “Understanding the factors for fast answers in technical q&a websites: an empirical study of four stack exchange websites,” Proceedings of the 40th International Conference on Software Engineering, 2018. [2] S. Mondal, C. M. K. Saifullah, A. Bhattacharjee, M. M. Rahman, and C. K. Roy, “Early detection and guidelines to improve unanswered questions on stack overflow,” in 14th Innovations in Software Engineering Conference (Formerly Known as India Software Engineering Conference), ser. ISEC 2021. New York, NY, USA: Association for Computing Machinery, 2021. [Online]. Available: https://doi.org/10.1145/3452383.3452392 [3] N. Viriyadamrongkij and T. Senivongse, “Measuring difficulty levels of javascript questions in question-answer community based on concept hierarchy,” 2017 14th International Joint Conference on Computer Science and Software Engineering (JCSSE), pp. 1–6, 2017. [4] S. A. Hassan, D. Das, A. Iqbal, A. Bosu, R. Shahriyar, and T. Ahmed, “Soqde: A supervised learning based question difficulty estimation model for stack overflow,” in 2018 25th Asia-Pacific Software Engineering Conference (APSEC), 2018, pp. 445–454. [5] D. Thukral, A. Pandey, R. Gupta, V. Goyal, and T. Chakraborty, “Diffque: Estimating relative difficulty of questions in community question answering services,” ACM Trans. Intell. Syst. Technol., vol. 10, pp. 42:1–42:27, 2019. [6] L. Mamykina, B. Manoim, M. Mittal, G. Hripcsak, and B. Hartmann, “Design lessons from the fastest qamp;a site in the west,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ser. CHI ’11. New York, NY, USA: Association for Computing Machinery, 2011, p. 28572866. [Online]. Available: https://doi.org/10.1145/1978942.1979366 [7] L. Wang, B. Wu, J. Yang, and S. Peng, “Personalized recommendation for new questions in community question answering,” in 2016 IEEE/ACM International 45 Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2016, pp. 901–908. [8] M. Asaduzzaman, A. S. Mashiyat, C. K. Roy, and K. A. Schneider, “Answering questions about unanswered questions of stack overflow,” in 2013 10th Working Conference on Mining Software Repositories (MSR), 2013, pp. 97–100. [9] L.Wang, L. Zhang, and J. Jiang, “Iea: an answerer recommendation approach on stack overflow,” Science China Information Sciences, vol. 62, 2019. [10] N. Viriyadamrongkij and T. Senivongse, “Measuring difficulty levels of javascript questions in question-answer community based on concept hierarchy,” in 2017 14th International Joint Conference on Computer Science and Software Engineering (JCSSE), 2017, pp. 1–6. [11] C. H. Papadimitriou, H. Tamaki, P. Raghavan, and S. Vempala, “Latent semantic indexing: A probabilistic analysis,” in Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, ser. PODS ’98. New York, NY, USA: Association for Computing Machinery, 1998, p. 159168. [Online]. Available: https://doi.org/10.1145/275487.275505 [12] “A beginners guide to latent dirichlet allocation(lda),” https://iq.opengenus.org/topic-modelling-techniques/, accessed: 9.05.2022. [13] “A beginners guide to latent dirichlet allocation(lda),” https://towardsdatascience.com/latent-dirichlet-allocation-lda-9d1cd064ffa2, accessed: 25.04.2022. [14] “Topic modelling techniques in nlp,” https://iq.opengenus.org/topic-modellingtechniques/, accessed: 25.04.2022. [15] “6 topic modeling,” https://www.tidytextmining.com/topicmodeling.html, accessed: 25.04.2022. [16] J. K. Pritchard, M. Stephens, and P. Donnelly, “Inference of population structure using multilocus genotype data,” Genetics, vol. 155, no. 2, pp. 945–959, 2000. [17] D. Falush, M. Stephens, and J. K. Pritchard, “Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies,” Genetics, vol. 164, no. 4, pp. 1567–1587, 2003. [18] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” J. Mach. Learn. Res., vol. 3, no. null, p. 9931022, mar 2003. [19] “Understanding word2vec and doc2vec,” https://shuzhanfan.github.io/2018/08/understandingword2vec- and-doc2vec/, accessed: 25.04.2022. 46 [20] “A gentle introduction to doc2vec,” https://medium.com/wisio/a-gentleintroduction- to-doc2vec-db3e8c0cce5e, accessed: 25.04.2022. [21] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013. [22] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” Advances in neural information processing systems, vol. 26, 2013. [23] Q. Le and T. Mikolov, “Distributed representations of sentences and documents,” in Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32, ser. ICML’14. JMLR.org, 2014, p. II1188II1196. [24] Y. Goldberg and O. Levy, “word2vec explained: deriving mikolov et al.’s negative-sampling word-embedding method,” arXiv preprint arXiv:1402.3722, 2014. [25] “Doc2vec,” https://blog.birost.com/a?ID=00600-e831ba42-3d77-495c-baa3- dba970172e91, accessed: 25.04.2022. [26] K. S. Jones, “A statistical interpretation of term specificity and its application in retrieval,” Journal of documentation, 1972. [27] M. T. Maybury, Karen Spärck Jones and Summarization. Dordrecht: Springer Netherlands, 2005, pp. 99–103. [Online]. Available: https://doi.org/10.1007/ 1-4020-3467-9_7 [28] B. Li and I. King, “Routing questions to appropriate answerers in community question answering services,” in Proceedings of the 19th ACM International Conference on Information and Knowledge Management, ser. CIKM ’10. New York, NY, USA: Association for Computing Machinery, 2010, p. 15851588. [Online]. Available: https://doi.org/10.1145/1871437.1871678 [29] A. Diyanati, B. S. Sheykhahmadloo, S. M. Fakhrahmad, M. H. Sadreddini, and M. H. Diyanati, “A proposed approach to determining expertise level of stackoverflow programmers based on mining of user comments,” J. Comput. Lang., vol. 61, p. 101000, 2020. [30] L. Yang, M. Qiu, S. Gottipati, F. Zhu, J. Jiang, H. Sun, and Z. Chen, “Cqarank: jointly model topics and expertise in community question answering,” Proceedings of the 22nd ACM international conference on Information & Knowledge Management, 2013. [31] Q. Wang, J. Liu, B. Wang, and L. Guo, “Question difficulty estimation in community question answering services,” in EMNLP, 2013. en_US
dc.identifier.uri http://hdl.handle.net/123456789/1780
dc.description Supervised by Mr. Md. Jubair Ibna Mostafa; Lecturer Mr. Md. Nazmul Haque,Lecturer Department of Computer Science and Engineering(CSE), Islamic University of Technology (IUT) Board Bazar, Gazipur-1704, Bangladesh. This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2022. en_US
dc.description.abstract Technical question answering sites, like Stack Overflow, are gaining enormous attention from the learners and practitioners of specialized fields to exchange their programming knowledge. Question answering on different topics has engaged all levels of programmers. All the developers don’t have the same level of expertise, and the question differs among them in terms of complexity and context. However, the existing approach of Stack Overflow models primarily filters out the questions based on tags, which is inefficient for predicting the difficulty level. Due to the limitation of the process, a large part of these posts fails to attract the attention of appropriate users, resulting in valid questions having no answer or significant delay in response time. Therefore, to address these limitations, we proposed three different supervised models using TF-IDF, Topic Modeling(LDA), and Doc2Vec that build more complicated relationships by extracting context-dependent features between the user and the question. Each of the models builds an informative relationship that helps classify the difficulty of a question. Extensive experiments on different variations of the datasets demonstrate the improved efficacy of our proposed models over contemporary models. The experiments find out that even with limited information, the models performance scores are satisfactory and the Doc2Vec model outperforms the other models under consideration. en_US
dc.language.iso en en_US
dc.publisher Department of Computer Science and Engineering(CSE), Islamic University of Technology(IUT), Board Bazar, Gazipur, Bangladesh en_US
dc.subject Stack Overflow, Difficulty Classification en_US
dc.title Classification of Stack Overflow Questions Based on Difficulty en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search IUT Repository


Advanced Search

Browse

My Account

Statistics