Abstract:
In contemporary cybersecurity research, the fusion of sampling techniques with state of-the-art Machine Learning (ML) and Deep Learning (DL) models has emerged as
a pivotal area of exploration, aimed at enhancing the efficacy of Intrusion Detection
Systems (IDS). This thesis delves into the intersection of sampling methodologies and
advanced learning algorithms to address the inherent challenges in class imbalance
prevalent in network intrusion datasets.
Class imbalance, a common issue in IDS datasets, often leads to suboptimal perfor mance as models tend to be biased towards the majority class, compromising their
ability to detect instances of the minority class—typically representing intrusions.
The proposed research harnesses the power of sampling techniques, encompassing
oversampling, undersampling, and hybrid approaches, to rectify this imbalance and
create a more representative learning environment.
Sampling methods are strategically combined with ML models like Support Vector
Machines (SVM), Random Forests, and k-Nearest Neighbors, along with DL mod els such as Convolutional Neural Networks (CNN) and Long Short-Term Memory
(LSTM) networks. This collaboration seeks to leverage the capabilities of these mod els to uncover complex structures and connections within the data.
Central to our research is the introduction of Feature Relevance and Adaptive Over sampling (FAAO), a novel approach that combines feature relevance assessment with
adaptive oversampling to address class imbalance. FAAO evaluates the importance of
different features to identify those most influential in distinguishing between classes,
ensuring that the oversampling process focuses on the most relevant features and im proves the quality of the synthetic samples.
Our primary objectives include exploring various sampling techniques, including FAAO,
and their impact on the performance of ML and DL-based IDS models. We will imple ment and assess the effectiveness of these techniques in mitigating class imbalance,
thereby enhancing the models’ overall detection accuracy, sensitivity, and specificity.
Furthermore, this study seeks to provide insights into the optimal pairing of sampling
techniques with specific ML and DL architectures, while equally paying attention
to feature relevance considering the inherent characteristics of intrusion detection
xi
datasets. The findings are anticipated to provide valuable guidelines for practition ers and researchers seeking to deploy robust and adaptive IDS solutions in real-world
scenarios.
By outlining the collaborative relationship between feature relevance, sampling tech niques, and advanced learning models, our work endeavors to pave the way for more
adaptive, resilient, and accurate intrusion detection mechanisms, ultimately fortify ing the cybersecurity landscape against evolv
Description:
Supervised by
Mr. Faisal Hussain,
Assistant Professor,
Department of Computer Science and Engineering (CSE)
Islamic University of Technology (IUT)
Board Bazar, Gazipur, Bangladesh
This thesis is submitted in partial fulfillment of the requirement for the degree of Bachelor of Science in Computer Science and Engineering, 2024