Identification of Fraudsters Involved in Phishing by Different Machine Learning Models

Karim, Md. Faiyed Bin; Tazreen, Nushera; Tarannum, Samiha

IUT Repository Home
→
Electrical and Electronics Engineering (EEE)
→
Thesis
→
Undergraduate
→
2022
→
View Item

dc.contributor.author	Karim, Md. Faiyed Bin
dc.contributor.author	Tazreen, Nushera
dc.contributor.author	Tarannum, Samiha
dc.date.accessioned	2023-01-05T10:10:57Z
dc.date.available	2023-01-05T10:10:57Z
dc.date.issued	2022-05-30
dc.identifier.citation	[1] V. Bhavsar, A. Kadlak, and S. Sharma, “Study on phishing attacks,” Int.J. Comput. Appl, vol. 182, pp. 27–29, 2018. [2] F. N. P. Office, “Internet crime complaint center2018 internet crime report,” 2019. [Online]. Available:https://www.fbi.gov/news/pressrel/press-releases/fbi-releasestheinternet-crime-complaint-center-2018-internet-crime-report [3] Verizon, “2021 data breach investigations report,” 2021. [Online].Available: https://www.verizon.com/business/resources/reports/dbir/ [4] ESET, “From crisis response to transformation,” 2020. [Online].Available: https://www.eset.com [5] M. Rosenthal, “Must-know phishing statistics,” 2022. [Online].Available: https://www.tessian.com/blog/phishing-statistics-2020/ [6] SonicWall, “2020 sonicwall cyber threat report: Threat actors pivot toward more targeted attacks, evasive exploits,” 2020.[Online]. Available: https://www.sonicwall.com/news/2020-sonicwallcyber-threat-report/ [7] APWG, “Phishing activity trends reports.” [Online]. Available: https://apwg.org/trendsreports/ 40 [8] A. K. Dutta, “Detecting phishing websites using machine learning technique,” PloS one, vol. 16, no. 10, p. e0258361, 2021. [9] H. Le, Q. Pham, D. Sahoo, and S. C. Hoi, “Urlnet: Learning a url representation with deep learning for malicious url detection,” arXiv preprint arXiv:1802.03162, 2018. [10] I. Corona, B. Biggio, M. Contini, L. Piras, R. Corda, M. Mereu,G. Mureddu, D. Ariu, and F. Roli, “Deltaphish: Detecting phishing webpages in compromised websites,” in European Symposium on Researchin Computer Security. Springer, 2017, pp. 370–388. [11] M. M. Nishat, F. Faisal, T. Hasan, M. F. B. Karim, Z. Islam, andM. R. K. Shagor, “An investigative approach to employ support vectorclassifier as a potential detector of brain cancer from mri dataset,”in 2021 International Conference on Electronics, Communications andInformation Technology (ICECIT). IEEE, 2021, pp. 1–4. [12] Y. Xin, L. Kong, Z. Liu, Y. Chen, Y. Li, H. Zhu, M. Gao, H. Hou, and C. Wang, “Machine learning and deep learning methods for cybersecurity,”Ieee access, vol. 6, pp. 35 365–35 381, 2018. [13] M. Ahsan, R. Gomes, and A. Denton, “Smote implementation on phishing data to enhance cybersecurity,” in 2018 IEEE International Conference on Electro/Information Technology (EIT). IEEE, 2018, pp. 0531–0536. [14] W. Ali, “Phishing website detection based on supervised machine learning with wrapper features selection,” International Journal of Advanced Computer Science and Applications, vol. 8, no. 9, pp. 72–78, 2017. [15] V. S. Lakshmi and M. Vijaya, “Efficient prediction of phishing websites using supervised learning algorithms,” Procedia Engineering, vol. 30, pp. 798–805, 2012. 41 [16] V. Shahrivari, M. M. Darabi, and M. Izadi, “Phishing detection using machine learning techniques,” arXiv preprint arXiv:2009.11116, 2020. [17] A. A. Ubing, S. K. B. Jasmi, A. Abdullah, N. Jhanjhi, and M. Supramaniam,“Phishing website detection: an improved accuracy through feature selection and ensemble learning,” International Journal of Advanced Computer Science and Applications, vol. 10, no. 1, pp. 252–257, 2019. [18] A. Subasi, E. Molah, F. Almkallawi, and T. J. Chaudhery, “Intelligent phishing website detection using random forest classifier,” in 2017 International conference on electrical and computing technologies and applications (ICECTA). IEEE, 2017, pp. 1–5. [19] M. A. U. H. Tahir, S. Asghar, A. Zafar, and S. Gillani, “A hybrid model to detect phishing-sites using supervised learning algorithms,” in 2016 International Conference on Computational Science and Computational Intelligence (CSCI). IEEE, 2016, pp. 1126–113 [20] J. Hong, T. Kim, J. Liu, N. Park, and S.-W. Kim, “Phishing url detection with lexical features and blacklisted domains,” in Adaptive Autonomous Secure Cyber Systems. Springer, 2020, pp. 253–2 [21] A. Moubayed, M. Injadat, A. Shami and H. Lutfiyya, "DNS Typo-Squatting Domain Detection: A Data Analytics & Machine Learning Based Approach," 2018 IEEE Global Communications Conference (GLOBECOM), 2018, pp. 1-7, doi: 10.1109/GLOCOM.2018.8647679. [22] V. B. et al, “study on phishing attacks,” International Journal of Computer Applications, 2018 42 [23] I.-F. Lam, W.-C. Xiao, S.-C. Wang, and K.-T. Chen, “Counteracting phishing page polymorphism: An image layout analysis approach,” in International Conference on Information Security and Assurance,pp. 270–279, Springer, 2009. [24] K. Krombholz, H. Hobel, M. Huber, and E. Weippl, “Advanced social engineering attacks,” Journal of Information Security and applications,vol. 22, pp. 113–122, 2015. [25] Phishing Websites Data Set. Available at: https://archive.ics.uci.edu/ml/datasets/phishing +websites (Accessed on: 13 May 2022) [26] Hura, & Vyas. (2021). Advances in communication and computational technology. Springer Singapore. [27] https://www.guru99.com/supervised-vs-unsupervised-learning.html (Accessed on: 13 May 2022) [28] Mathanker, S. K., Weckler, P. R., Bowser, T. J., Wang, N., & Maness, N. O. (2011). AdaBoost classifiers for pecan defect classification. Computers and electronics in agriculture, 77(1), 60-68. [29] https://medium.com/almabetter/xgboost-dd38f73233fa (Accessed on: 13 May 2022) [30] https://towardsdatascience.com/quadratic-discriminant-analysis-ae55d8a8148a? gi=a210488bc789 (Accessed on: 13 May 2022) [31] https://towardsdatascience.com/taking-the-confusion-out-of-confusion-matricesc1ce054b3d3e (Accessed on: 13 May 2022) [32] https://www.nottingham.ac.uk/nmp/sonet/rlos/ebp/sensitivity_specificity/page_four.html (Accessed on: 13 May 2022) [33] https://machinelearningmastery.com/precision-recall-and-f-measure-for-imbalancedclassification/#:~:text=Precision%20is%20a%20metric%20that,positive%20examples% 20that%20were%20predicted. (Accessed on: 13 May 2022) [34] https://machinelearningmastery.com/precision-recall-and-f-measure-for-imbalancedclassification/ (Accessed on: 13 May 2022)	en_US
dc.identifier.uri	http://hdl.handle.net/123456789/1632
dc.description	Supervised by Mr. Safayat Bin Hakim Assistant Professor Department of Electrical and Electronic Engineering Islamic University of Technology. This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Electrical and Electronic Engineering, 2022.	en_US
dc.description.abstract	With digitization of the current age, number of fraudsters in the digital realm has increased manifolds. Although the internet can be used for much good of the general population, the increase in number of unscrupulous people in online is a grave danger to the general public. Among many of the vices in the internet, one of the common one is phishing. To tackle phishing many approaches has been taken, of them ML based approach is one of the leading approaches. In our research work, we compared and contrasted many ML models to find out which one is most suitable for phishing detection. Our research is unique in regards that we have integrated data preprocessing and reduced the number of features for complexity reduction. Among these models XGBoost brought the highest accuracy after the hyperparameter tuning which was 97.0455%.	en_US
dc.language.iso	en	en_US
dc.publisher	Department of Electrical and Electronic Engineering(EEE), Islamic University of Technology(IUT),	en_US
dc.subject	Machine learning, Phishing, XGBoost, SVM, Preprocessing, Complexity reduction,	en_US
dc.title	Identification of Fraudsters Involved in Phishing by Different Machine Learning Models	en_US
dc.type	Thesis	en_US