Abstract:
Class imbalance is a common challenge in real-world datasets. In critical applications such as medical diagnosis, intrusion detection, fault detection, and disease identification. In most of these cases, the positive examples are very rare. For this, machine learning models often get biased towards to negative class and identify any unseen samples as negative class examples. This imbalance mostly favors the majority class, resulting in poor prediction performance for the minority class. This thesis thoroughly evaluates various state-of-the-art methods for addressing class imbalance over 100+ datasets with different imbalance ratios. A thorough experimental analysis have been done to find out the patterns of the outcomes. By experimenting with numerous sampling strategies, including under-sampling, over-sampling, and hybrid approaches, this study highlights the strengths and weaknesses of each technique. Additionally, we explored the impact of class overlap, a condition where instances of different classes share similar features, further complicating predictive modeling. The findings underscore the necessity of combining sampling methods with cost-sensitive learning to improve prediction accuracy and generalization. The research introduces novel hybrid approaches that optimize the balance between majority and
minority classes, demonstrating significant improvements in performance. These advancements contribute valuable insights and methodologies for future research and practical applications in handling imbalanced data.
Description:
Supervised by
Mr. Asif Newaz,
Lecturer,
Department of Electrical and Electronic Engineering (EEE)
Islamic University of Technology (IUT)
Board Bazar, Gazipur, Bangladesh
This thesis is submitted in partial fulfillment of the requirement for the degree of Bachelor of Science in Electrical and Electronic Engineering, 2024