A Comprehensive approach to the detection of Liver disease using Machine Learning Techniques with comparison of different Oversampling Techniques on Imbalanced Liver Disease Dataset

Show simple item record

dc.contributor.author Arefin, Mir Samsul
dc.contributor.author Naimah, Chowdhury Sadeeya
dc.contributor.author Rahman, Rayeed
dc.date.accessioned 2024-01-17T06:19:42Z
dc.date.available 2024-01-17T06:19:42Z
dc.date.issued 2023-04-30
dc.identifier.citation 1. Choudhary, R., Gopalakrishnan, T., Ruby, D., Gayathri, A., Murthy, V.S. and Shekhar, R., 2021. An Efficient Model for Predicting Liver Disease Using Machine Learning. Data Analytics in Bioinformatics: A Machine Learning Perspective, pp.443-457. 2. Dritsas, E. and Trigka, M., 2023. Supervised Machine Learning Models for Liver Disease Risk Prediction. Computers, 12(1), p.19. 3. Tokala, S., Hajarathaiah, K., Gunda, S.R.P., Botla, S., Nalluri, L., Nagamanohar, P., Anamalamudi, S. and Enduri, M.K., 2023. Liver Disease Prediction and Classification using Machine Learning Techniques. International Journal of Advanced Computer Science and Applications, 14(2). 4. Weng, S., Hu, D., Chen, J., Yang, Y. and Peng, D., 2023. Prediction of Fatty Liver Disease in a Chinese Population Using Machine-Learning Algorithms. Diagnostics, 13(6), p.1168. 5. Peng, H.Y., Duan, S.J., Pan, L., Wang, M.Y., Chen, J.L., Wang, Y.C. and Yao, S.K., 2023. Development and validation of machine learning models for nonalcoholic fatty liver disease. Hepatobiliary & Pancreatic Diseases International. 6. Kavakiotis, I., Tsave, O., Salifoglou, A., Maglaveras, N., Vlahavas, I. and Chouvarda, I., 2017. Machine learning and data mining methods in diabetes research. Computational and structural biotechnology journal, 15, pp.104-116. 7. Ramana, B.V., Babu, M.P. and Venkateswarlu, N.B., 2012. Liver classification using modified rotation forest. International Journal of Engineering Research and Development, 6(1), pp.17- 24. 8. Pei, X., Deng, Q., Liu, Z., Yan, X. and Sun, W., 2021. Machine learning algorithms for predicting fatty liver disease. Annals of Nutrition and Metabolism, 77(1), pp.38-45. 9. Xiao, J., Wang, F., Wong, N.K., He, J., Zhang, R., Sun, R., Xu, Y., Liu, Y., Li, W., Koike, K. and He, W., 2019. Global liver disease burdens and research trends: analysis from a Chinese perspective. Journal of hepatology, 71(1), pp.212-221. 10. Khan, M.A.R., Afrin, F., Prity, F.S., Ahammad, I., Fatema, S., Prosad, R., Hasan, M.K. and Uddin, M., 2023. An effective approach for early liver disease prediction and sensitivity analysis. Iran Journal of Computer Science, pp.1-19. 11. Alber, M., Buganza Tepole, A., Cannon, W.R., De, S., Dura-Bernal, S., Garikipati, K., Karniadakis, G., Lytton, W.W., Perdikaris, P., Petzold, L. and Kuhl, E., 2019. Integrating machine learning and multiscale modeling—perspectives, challenges, and opportunities in the biological, biomedical, and behavioral sciences. NPJ digital medicine, 2(1), p.115. 12. Kolachalama, V.B. and Garg, P.S., 2018. Machine learning and medical education. NPJ digital medicine, 1(1), p.54. 13. Dritsas, E. and Trigka, M., 2023. Supervised Machine Learning Models for Liver Disease Risk Prediction. Computers, 12(1), p.19. 63 14. Weng, S., Hu, D., Chen, J., Yang, Y. and Peng, D., 2023. Prediction of Fatty Liver Disease in a Chinese Population Using Machine-Learning Algorithms. Diagnostics, 13(6), p.1168. 15. Tokala, S., Hajarathaiah, K., Gunda, S.R.P., Botla, S., Nalluri, L., Nagamanohar, P., Anamalamudi, S. and Enduri, M.K., 2023. Liver Disease Prediction and Classification using Machine Learning Techniques. International Journal of Advanced Computer Science and Applications, 14(2). 16. Peng, H.Y., Duan, S.J., Pan, L., Wang, M.Y., Chen, J.L., Wang, Y.C. and Yao, S.K., 2023. Development and validation of machine learning models for nonalcoholic fatty liver disease. Hepatobiliary & Pancreatic Diseases International. 17. Gupta, K., Jiwani, N., Afreen, N. and Divyarani, D., 2022, April. Liver Disease Prediction using Machine learning Classification Techniques. In 2022 IEEE 11th International Conference on Communication Systems and Network Technologies (CSNT) (pp. 221-226). IEEE. 18. Ramana, B.V., Babu, M.S.P. and Venkateswarlu, N.B., 2011. A critical study of selected classification algorithms for liver disease diagnosis. International Journal of Database Management Systems, 3(2), pp.101-114. 19. Pei, X., Deng, Q., Liu, Z., Yan, X. and Sun, W., 2021. Machine learning algorithms for predicting fatty liver disease. Annals of Nutrition and Metabolism, 77(1), pp.38-45. 20. Sivasangari, A., Reddy, B.J.K., Kiran, A. and Ajitha, P., 2020, October. Diagnosis of liver disease using machine learning models. In 2020 Fourth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)(I-SMAC) (pp. 627-630). IEEE. 21. Akter, S., Shekhar, H.U. and Akhteruzzaman, S., 2021. Application of biochemical tests and machine learning techniques to diagnose and evaluate liver disease. Advances in Bioscience and Biotechnology, 12(6), pp.154-172. 22. Kuzhippallil, M.A., Joseph, C. and Kannan, A., 2020, March. Comparative analysis of machine learning techniques for indian liver disease patients. In 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS) (pp. 778-782). IEEE. 23. Fathi, M., Nemati, M., Mohammadi, S.M. and Abbasi-Kesbi, R., 2020. A machine learning approach based on SVM for classification of liver diseases. Biomedical Engineering: Applications, Basis and Communications, 32(03), p.2050018. 24. OUR DATASET 25. Ramana, B.V., Babu, M.S.P. and Venkateswarlu, N.B., 2012. A critical comparative study of liver patients from USA and INDIA: an exploratory analysis. International Journal of Computer Science Issues (IJCSI), 9(3), p.506. 26. Schmucker, D.L., 2005. Age-related changes in liver structure and function: Implications for disease?. Experimental gerontology, 40(8-9), pp.650-659. 27. Harrison-Findik, D.D., 2010. Gender-related variations in iron metabolism and liver diseases. World journal of hepatology, 2(8), p.302. 28. Rahm, E. and Do, H.H., 2000. Data cleaning: Problems and current approaches. IEEE Data Eng. Bull., 23(4), pp.3-13. 64 29. Mehrotra, D.V., Liu, F. and Permutt, T., 2017. Missing data in clinical trials: control‐based mean imputation and sensitivity analysis. Pharmaceutical statistics, 16(5), pp.378-392. 30. Weisberg, S., 2001. Yeo-Johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June, 1, p.2003. 31. [31] https://www.atoti.io/articles/when-to-perform-a-feature scaling/#:~:text=Feature%20scaling%20is%20a%20method,during%20the%20data%20prepr ocessing%20step. 32. Abdennour, N., Ouni, T. and Amor, N.B., 2021, November. The importance of signal pre processing for machine learning: The influence of Data scaling in a driver identity classification. In 2021 IEEE/ACS 18th International Conference on Computer Systems and Applications (AICCSA) (pp. 1-6). IEEE. 33. Alshaher, H., 2021. Studying the effects of feature scaling in machine learning (Doctoral dissertation, North Carolina Agricultural and Technical State University). 34. https://www.analyticsvidhya.com/blog/2020/04/feature-scaling-machine-learning normalization-standardization/ 35. https://www.analyticsvidhya.com/blog/2021/06/5-techniques-to-handle-imbalanced-data-for-a classification-problem/ 36. https://machinelearningmastery.com/imbalanced-classification-is-hard/ 37. Ito, A., Saito, K., Ueno, R. and Homma, N., 2021. Imbalanced data problems in deep learning based side-channel attacks: Analysis and solution. IEEE Transactions on Information Forensics and Security, 16, pp.3790-3802. 38. Blagus, R., Lusa, L. SMOTE for high-dimensional class-imbalanced data. BMC Bioinformatics 14, 106 (2013). https://doi.org/10.1186/1471-2105-14-106 39. Hadwan, M., Al-Sarem, M., Saeed, F. and Al-Hagery, M.A., 2022. An improved sentiment classification approach for measuring user satisfaction toward governmental services’ mobile apps using machine learning methods with feature engineering and SMOTE technique. Applied Sciences, 12(11), p.5547. 40. R. Blagus and L. Lusa, "Evaluation of SMOTE for High-Dimensional Class-Imbalanced Microarray Data," 2012 11th International Conference on Machine Learning and Applications, Boca Raton, FL, USA, 2012, pp. 89-94, doi: 10.1109/ICMLA.2012.183. 41. Haibo He, Yang Bai, E. A. Garcia and Shutao Li, "ADASYN: Adaptive synthetic sampling approach for imbalanced learning," 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, 2008, pp. 1322- 1328, doi: 10.1109/IJCNN.2008.4633969. 42. He, H., Yang, B., Garcia, E.A. and Li, S.A., adaptive synthetic sampling approach for imbalanced learning. Proeedings f the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence); June 2008; Hong Kong, China. 43. Ramadhan, N.G., 2021. Comparative Analysis Of Adasyn-Svm And Smote-Svm Methods On The Detection Of Type 2 Diabetes Mellitus. Scientific Journal Of Informatics, 8(2), pp.276-282. 65 44. https://machinelearningmastery.com/combine-oversampling-and-undersampling-for imbalancedclassification/#:~:text=SMOTE%20is%20an%20oversampling%20method,dataset %20that%20have%20different%20classes. 45. https://www.kdnuggets.com/2016/08/learning-from imbalancedclasses.html/2#:~:text=For%20example%2C%20Tomek%20links%20are,majority %20instance%20of%20the%20pair. 46. Monard, M.C. and Batista, G.E.A.P.A., 2002. Learning with skewed class distributions. Advances in Logic, Artificial Intelligence and Robotics, 85, pp.173-180. 47. Hairani, H., Anggrawan, A. and Priyanto, D., 2023. Improvement Performance of the Random Forest Method on Unbalanced Diabetes Data Classification Using Smote-Tomek Link. JOIV: International Journal on Informatics Visualization, 7(1), pp.258-264. 48. Chandra, W., Suprihatin, B. and Resti, Y., 2023. Median-KNN Regressor-SMOTE-Tomek Links for Handling Missing and Imbalanced Data in Air Quality Prediction. Symmetry, 15(4), p.887. 49. Q. Ning, X. Zhao and Z. Ma, "A Novel Method for Identification of Glutarylation Sites Combining Borderline-SMOTE With Tomek Links Technique in Imbalanced Data," in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 19, no. 5, pp. 2632-2641, 1 Sept.-Oct. 2022, doi: 10.1109/TCBB.2021.3095482. 50. Zhou, H., Dong, X., Xia, S. and Wang, G., 2021. Weighted oversampling algorithms for imbalanced problems and application in prediction of streamflow. Knowledge-Based Systems, 229, p.107306. 51. Jeatrakul, P., 2012. Enhancing classification performance over noise and imbalanced data problems (Doctoral dissertation, Murdoch University). 52. Xu, Z., Shen, D., Nie, T., Kou, Y., Yin, N. and Han, X., 2021. A cluster-based oversampling algorithm combining SMOTE and k-means for imbalanced medical data. Information Sciences, 572, pp.574-589. 53. Singh, N.D. and Dhall, A., 2018. Clustering and learning from imbalanced data. arXiv preprint arXiv:1811.00972. 54. Buabeng, A., Simons, A., Frempong, N.K. and Ziggah, Y.Y., 2021. A novel hybrid predictive maintenance model based on clustering, smote and multi-layer perceptron neural network optimised with grey wolf algorithm. SN Applied Sciences, 3(5), p.593. 55. https://towardsdatascience.com/imbalanced-classification-in-python-smote-enn-method db5db06b8d50 56. Gao, Q., Jin, X., Xia, E., Wu, X., Gu, L., Yan, H., Xia, Y. and Li, S., 2020. Identification of orphan genes in unbalanced datasets based on ensemble learning. Frontiers in Genetics, 11, p.820. 57. More, A., 2016. Survey of resampling techniques for improving classification performance in unbalanced datasets. arXiv preprint arXiv:1608.06048. 58. Lamari, M., Azizi, N., Hammami, N.E., Boukhamla, A., Cheriguene, S., Dendani, N. and Benzebouchi, N.E., 2021. SMOTE–ENN-Based Data Sampling and Improved Dynamic 66 Ensemble Selection for Imbalanced Medical Data Classification. In Advances on Smart and Soft Computing: Proceedings of ICACIn 2020 (pp. 37-49). Springer Singapore. 59. Kumari, M. and Subbarao, N., 2022. A hybrid resampling algorithms SMOTE and ENN based deep learning models for identification of Marburg virus inhibitors. Future Medicinal Chemistry, 14(10), pp.701-715. 60. Puri, A. and Kumar Gupta, M., 2022. Improved hybrid bag-boost ensemble with K-means SMOTE–ENN technique for handling noisy class imbalanced data. The Computer Journal, 65(1), pp.124-138. 61. Azhar, N.A., Pozi, M.S.M., Din, A.M. and Jatowt, A., 2022. An Investigation of SMOTE based Methods for Imbalanced Datasets with Data Complexity Analysis. IEEE Transactions on Knowledge and Data Engineering. 62. Kabir, M.F. and Ludwig, S., 2018, December. Classification of breast cancer risk factors using several resampling approaches. In 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA) (pp. 1243-1248). IEEE. 63. Wang, Q., Luo, Z., Huang, J., Feng, Y. and Liu, Z., 2017. A novel ensemble method for imbalanced data learning: bagging of extrapolation-SMOTE SVM. Computational intelligence and neuroscience, 2017. 64. Xu, Z., Shen, D., Nie, T. and Kou, Y., 2020. A hybrid sampling algorithm combining M-SMOTE and ENN based on random forest for medical imbalanced data. Journal of Biomedical Informatics, 107, p.103465. 65. https://www.sciencedirect.com/topics/engineering/confusionmatrix#:~:text=A%20confusio n%20matrix%20is%20a,performance%20of%20a%20classification%20algorithm. 66. https://deepai.org/machine-learning-glossary-and-terms/accuracy-error-rate https://www.javatpoint.com/precision-and-recall-in-machine-learning 67. https://www.iguazio.com/glossary/recall/#:~:text=Recall%2C%20also%20known%20as%20t he,total%20samples%20for%20that%20class. 68. https://www.v7labs.com/blog/f1-score-guide#:~:text=for%20Machine%20Learning- ,What%20is%20F1%20score%3F,prediction%20across%20the%20entire%20dataset. 69. https://www.anyscale.com/blog/what-is hyperparametertuning#:~:text=Hyperparameter%20tuning%20consists%20of%20finding,bet ter%20results%20with%20fewer%20errors. 70. Ahmad, G.N., Fatima, H., Ullah, S. and Saidi, A.S., 2022. Efficient medical diagnosis of human heart diseases using machine learning techniques with and without GridSearchCV. IEEE Access, 10, pp.80151-80173. 71. Chakraborty, D. and Elzarka, H., 2019. Advanced machine learning techniques for building performance simulation: a comparative analysis. Journal of Building Performance Simulation, 12(2), pp.193-207 en_US
dc.identifier.uri http://hdl.handle.net/123456789/2037
dc.description Supervised by Mr. Mirza Muntasir Nishat, Assistant Professor, Department of Electrical and Electronics Engineering (EEE) Islamic University of Technology (IUT) Board Bazar, Gazipur-1704, Bangladesh en_US
dc.description.abstract The liver is one of the most important organs in the body. It is responsible for controlling the chemical balance of the bloodstream as well as the removal of waste products among other vital functions. Liver disease is important to be diagnosed early on as symptoms do not begin to show until most of the liver is already damaged. Machine learning could be a crucial tool in the prediction of liver disease in patients which could lead to early diagnosis and also early treatment. In this study a dataset with 583 instances has been pre-processed and the imbalance had been handled in 5 separate ways, namely, Synthetic Minority Oversampling Technique (SMOTE), Adaptive Synthetic (ADASYN), Synthetic Minority Oversampling Technique and Conformal Clustering (CC), Synthetic Minority Oversampling Technique and Tomeklinks and Synthetic Minority Oversampling Technique and edited nearest neighbor (SMOTE+ENN). Then various machine learning algorithms like Decision Tree Classifier, Logistic Regression, Gaussian Naïve Bayes, Random Forest Classifier, K-Nearest Neighbors, and Support Vector Machine algorithms etc has been used. The experiment gave the best result when SMOTE+ENN was used as the imbalance handling technique with an accuracy of 98.37%. This accuracy was found using the support vector machine (SVM) approach. Therefore, this study shows the comparative analysis of the different imbalance handling techniques and the one which performs the best among each of these. It presents SMOTE+ENN as the best in case of this specific dataset. en_US
dc.language.iso en en_US
dc.publisher Department of Electrical and Elecrtonics Engineering(EEE), Islamic University of Technology(IUT), Board Bazar, Gazipur-1704, Bangladesh en_US
dc.title A Comprehensive approach to the detection of Liver disease using Machine Learning Techniques with comparison of different Oversampling Techniques on Imbalanced Liver Disease Dataset en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search IUT Repository


Advanced Search

Browse

My Account

Statistics