Abstract:
Classifying cancer using gene expression can be an important tool for under standing the specific characteristics of a patient’s cancer and for guiding the
most appropriate treatment approach. By identifying the specific genes that are
involved in the development and progression of a particular cancer, it may be
possible to tailor treatment to target those genes and improve outcomes for the
patient. In addition, by understanding the genetic makeup of a patient’s cancer, it
may be possible to identify clinical trials or targeted therapies that may be more
effective for that patient. Here, in our study, we worked with the TCGA Pan Cancer dataset where we used the RNA-seq data for analyzing the gene expres sions. The dataset comprises 33 types of cancer. Our study mainly focuses on
implementing an explainable AI-based panCancer classification approach using
gene expression analysis. The goal is to accurately detect the type of cancer in in dividuals within a short time. We employed seven classifier algorithms- Logistic
Regression, SVM, XGBoost, Random Forest, MLP, 1-D CNN, and TabNet. To
enhance the performance of the models, we utilized feature selection techniques
such as Lasso, SelectFromModel, Select-K-Best, and ElasticNet. SelectFrom Model with 500 features yielded the best performance. We applied ensemble
methods of probability averaging and max voting, with probability averaging
achieving the highest accuracy of 96.60%. Validation of the selected features’
contribution and comparison with gene sets from DESeq2 analysis confirmed
their significance and relevance. This approach provides insights into cancer specific molecular mechanisms and pathways. Overall, our study demonstrates
the effectiveness of feature selection in reducing dimensionality while maintain ing predictive power and biological relevance
Description:
Supervised by
Mr. Tareque Mohmud Chowdhury,
Assistant Professor,
Mr. Tasnim Ahmed,
Lecturer,