An Ensemble Method for Cancer Classification and Identification of Cancer-Specific Genes from Genomic Data

Rizwan, Siana; Tabassum, Farzana; Islam, Sabrina

dc.contributor.author	Rizwan, Siana
dc.contributor.author	Tabassum, Farzana
dc.contributor.author	Islam, Sabrina
dc.date.accessioned	2024-09-05T08:18:41Z
dc.date.available	2024-09-05T08:18:41Z
dc.date.issued	2023-05-30
dc.identifier.citation	[1] Hyuna Sung et al. “Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries”. In: CA: a cancer journal for clinicians 71.3 (2021), pp. 209–249. [2] Kristen Trost Mantlo. “Understanding Young Adult Survivors of Childhood Cancers’ Participation in Late Effects Screening: A Mixed Methods Approach”. PhD thesis. Old Dominion University, 2019. [3] David A Hanauer et al. “Bioinformatics approaches in the study of cancer”. In: Current molecular medicine 7.1 (2007), pp. 133–141. [4] Yi-Ping Phoebe Chen and Feng Chen. “Identifying targets for drug discovery using bioinformatics”. In: Expert opinion on therapeutic targets 12.4 (2008), pp. 383–389. [5] Jun Wang et al. “Regulatory roles of long noncoding RNAs implicated in cancer hallmarks”. In: International journal of cancer 146.4 (2020), pp. 906–916. [6] Aisha Patel. “Benign vs malignant tumors”. In: JAMA oncology 6.9 (2020), pp. 1488–1488. [7] David V Schapira et al. “Intensive care, survival, and expense of treating criti cally III cancer patients”. In: Jama 269.6 (1993), pp. 783–786. [8] Julie Eggert. “Genetics and genomics in oncology nursing: what does every nurse need to know?” In: Nursing Clinics 52.1 (2017), pp. 1–25. [9] Ying Lu and Jiawei Han. “Cancer classification using gene expression data”. In: Information Systems 28.4 (2003), pp. 243–268. [10] Kirk J Mantione et al. “Comparing bioinformatic gene expression profiling methods: microarray and RNA-Seq”. In: Medical science monitor basic re search 20 (2014), p. 138. 78 REFERENCES 79 [11] Matthew E Ritchie et al. “limma powers differential expression analyses for RNA-sequencing and microarray studies”. In: Nucleic acids research 43.7 (2015), e47–e47. [12] Christine H Chung, Philip S Bernard, and Charles M Perou. “Molecular por traits and the family tree of cancer”. In: Nature genetics 32.4 (2002), pp. 533– 540. [13] Douglas Hanahan and Robert A Weinberg. “Hallmarks of cancer: the next gen eration”. In: cell 144.5 (2011), pp. 646–674. [14] Andre Esteva et al. “Dermatologist-level classification of skin cancer with deep neural networks”. In: nature 542.7639 (2017), pp. 115–118. [15] E Farshi. “Peptide-Based mRNA Vaccines”. In: J Gastro Hepato 9.16 (2023), pp. 1–6. [16] Davide Ruggero and Pier Paolo Pandolfi. “Does the ribosome translate cancer?” In: Nature Reviews Cancer 3.3 (2003), pp. 179–192. [17] Wengong Si et al. “The role and mechanisms of action of microRNAs in cancer drug resistance”. In: Clinical epigenetics 11.1 (2019), pp. 1–24. [18] Edward L Tatum. “Molecular biology, nucleic acids, and the future of medicine”. In: Perspectives in biology and medicine 10.1 (1966), pp. 19–32. [19] Lela Buckingham. “Fundamentals of Nucleic Acid Biochemistry: An Overview”. In: (). [20] Francis Crick. “Central dogma of molecular biology”. In: Nature 227.5258 (1970), pp. 561–563. [21] Robert G Roeder. “Transcriptional regulation and the role of diverse coactiva tors in animal cells”. In: FEBS letters 579.4 (2005), pp. 909–915. [22] Melissa J Moore. “From birth to death: the complex lives of eukaryotic mR NAs”. In: Science 309.5740 (2005), pp. 1514–1518. [23] Richard W Carthew and Erik J Sontheimer. “Origins and mechanisms of miR NAs and siRNAs”. In: Cell 136.4 (2009), pp. 642–655. REFERENCES 80 [24] David P Bartel. “MicroRNAs: genomics, biogenesis, mechanism, and func tion”. In: cell 116.2 (2004), pp. 281–297. [25] John L Rinn and Howard Y Chang. “Genome regulation by long noncoding RNAs”. In: Annual review of biochemistry 81 (2012), pp. 145–166. [26] Timothy H Bestor. “The DNA methyltransferases of mammals”. In: Human molecular genetics 9.16 (2000), pp. 2395–2402. [27] M Perou Charles et al. “Molecular portraits of human breast tumours”. In: Na ture 406.6797 (2000), pp. 747–752. [28] Gyongyi Munk ¨ acsy, Libero Santarpia, and Bal ´ azs Gy ´ orffy. “Gene Expression ˝ Profiling in Early Breast Cancer—Patient Stratification Based on Molecular and Tumor Microenvironment Features”. In: Biomedicines 10.2 (2022), p. 248. [29] George A Calin and Carlo M Croce. “MicroRNA signatures in human cancers”. In: Nature reviews cancer 6.11 (2006), pp. 857–866. [30] Barbara Pardini et al. “Noncoding RNAs in extracellular fluids as cancer biomark ers: the new frontier of liquid biopsies”. In: Cancers 11.8 (2019), p. 1170. [31] Hui Li et al. “A neoplastic gene fusion mimics trans-splicing of RNAs in normal human cells”. In: Science 321.5894 (2008), pp. 1357–1361. [32] Kenzui Taniue and Nobuyoshi Akimitsu. “Fusion genes and RNAs in cancer development”. In: Non-coding RNA 7.1 (2021), p. 10. [33] Konstantina Kourou et al. “Machine learning applications in cancer progno sis and prediction”. In: Computational and structural biotechnology journal 13 (2015), pp. 8–17. [34] Meriem Amrane et al. “Breast cancer classification using machine learning”. In: 2018 electric electronics, computer science, biomedical engineerings’ meeting (EBBT). IEEE. 2018, pp. 1–4. REFERENCES 81 [35] Sara Tarek, Reda Abd Elwahab, and Mahmoud Shoman. “Gene expression based cancer classification”. In: Egyptian Informatics Journal 18.3 (2017), pp. 151– 159. ISSN: 1110-8665. DOI: https://doi.org/10.1016/j.eij.2016.12. 001. URL: https://www.sciencedirect.com/science/article/pii/ S1110866516300809. [36] Maxim D Podolsky et al. “Evaluation of machine learning algorithm utilization for lung cancer classification based on gene expression levels”. In: Asian Pacific journal of cancer prevention 17.2 (2016), pp. 835–838. [37] Boyu Lyu and Anamul Haque. “Deep learning based tumor type classification using gene expression data”. In: Proceedings of the 2018 ACM international conference on bioinformatics, computational biology, and health informatics. 2018, pp. 89–96. [38] Joseph M de Guia, Madhavi Devaraj, and Carson K Leung. “DeepGx: deep learning using gene expression for cancer classification”. In: Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 2019, pp. 913–920. [39] Yuanyuan Li et al. “A comprehensive genomic pan-cancer classification us ing The Cancer Genome Atlas gene expression data”. In: BMC genomics 18.1 (2017), pp. 1–13. [40] Pradipta Maji and Chandra Das. “Relevant and significant supervised gene clus ters for microarray cancer classification”. In: IEEE Transactions on nanobio science 11.2 (2012), pp. 161–168. [41] Yi-Hsin Hsu and Dong Si. “Cancer type prediction and classification based on rna-sequencing data”. In: 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE. 2018, pp. 5374–5377. [42] Milad Mostavi et al. “Convolutional neural network models for cancer type prediction based on gene expression”. In: BMC medical genomics 13 (2020), pp. 1–13. REFERENCES 82 [43] Jean-Franc¸ois Laplante and Moulay A Akhloufi. “Predicting cancer types from miRNA stem-loops using deep learning”. In: 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE. 2020, pp. 5312–5315. [44] Kazi Ferdous Mahin et al. “PanClassif: Improving pan cancer classification of single cell RNA-seq gene expression data using machine learning”. In: Ge nomics 114.2 (2022), p. 110264. [45] Leo Breiman. “Random forests”. In: Machine learning 45.1 (2001), pp. 5–32. [46] Yoav Freund, Robert Schapire, and Naoki Abe. “A short introduction to boost ing”. In: Journal-Japanese Society For Artificial Intelligence 14.771-780 (1999), p. 1612. [47] Anestis Antoniadis, Sophie Lambert-Lacroix, and Fred´ erique Leblanc. “Effec- ´ tive dimension reduction methods for tumor classification using gene expres sion data”. In: Bioinformatics 19.5 (2003), pp. 563–570. [48] D Pavithra and B Lakshmanan. “Feature selection and classification in gene expression cancer data”. In: 2017 International Conference on Computational Intelligence in Data Science (ICCIDS). IEEE. 2017, pp. 1–6. [49] Yongjun Piao and Keun Ho Ryu. “Detection of differentially expressed genes using feature selection approach from RNA-seq”. In: 2017 IEEE International Conference on Big Data and Smart Computing (BigComp). IEEE. 2017, pp. 304– 308. [50] Hanaa Salem, Gamal Attiya, and Nawal El-Fishawy. “Classification of human cancer diseases by gene expression profiles”. In: Applied Soft Computing 50 (2017), pp. 124–134. [51] Isabelle Guyon et al. “Gene selection for cancer classification using support vector machines”. In: Machine learning 46.1 (2002), pp. 389–422. [52] Alejandro Lopez-Rincon et al. “Automatic discovery of 100-miRNA signature for cancer classification using ensemble feature selection”. In: BMC bioinfor matics 20.1 (2019), pp. 1–17. REFERENCES 83 [53] Pilar Garcıa-Dıaz et al. “Unsupervised feature selection algorithm for multiclass cancer classification of gene expression RNA-Seq data”. In: Genomics 112.2 (2020), pp. 1916–1925. [54] Yu-Heng Lai et al. “Overall survival prediction of non-small cell lung cancer by integrating microarray and clinical data with deep learning”. In: Scientific reports 10.1 (2020), p. 4679. [55] Dejun Zhang et al. “Integrating feature selection and feature extraction methods with deep learning to predict clinical outcome of breast cancer”. In: Ieee Access 6 (2018), pp. 28936–28944. [56] JN Weinstein. “TCGAR Network, EA Collisson et al.,“The cancer genome at las pan-cancer analysis project,”” in: Nature Genetics 45.10 (2013), pp. 1113– 1120. [57] Yingdong Zhao et al. “TPM, FPKM, or normalized counts? A comparative study of quantification measures for the analysis of RNA-seq data from the NCI patient-derived models repository”. In: Journal of translational medicine 19.1 (2021), pp. 1–15. [58] Cole Trapnell et al. “Transcript assembly and quantification by RNA-Seq re veals unannotated transcripts and isoform switching during cell differentiation”. In: Nature biotechnology 28.5 (2010), pp. 511–515. [59] Bo Li and Colin N Dewey. “RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome”. In: BMC bioinformatics 12 (2011), pp. 1–16. [60] Mia Huljanah et al. “Feature selection using random forest classifier for pre dicting prostate cancer”. In: IOP Conference Series: Materials Science and En gineering. Vol. 546. 5. IOP Publishing. 2019, p. 052031. [61] David G Kleinbaum et al. Logistic regression. Springer, 2002. [62] Lipo Wang. Support vector machines: theory and applications. Vol. 177. Springer Science & Business Media, 2005. REFERENCES 84 [63] Tianqi Chen and Carlos Guestrin. “Xgboost: A scalable tree boosting system”. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 2016, pp. 785–794. [64] Daniel Svozil, Vladimir Kvasnicka, and Jiri Pospichal. “Introduction to multi layer feed-forward neural networks”. In: Chemometrics and intelligent labora tory systems 39.1 (1997), pp. 43–62. [65] Serkan Kiranyaz et al. “1D convolutional neural networks and applications: A survey”. In: Mechanical systems and signal processing 151 (2021), p. 107398. [66] Ravisutha Sakrepatna Srinivasamurthy. “Understanding 1D Convolutional Neu ral Networks Using Multiclass Time-Varying Signalss”. PhD thesis. Clemson University, 2018. [67] Sercan O Arik and Tomas Pfister. “Tabnet: Attentive interpretable tabular learn- ¨ ing”. In: Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 35. 8. 2021, pp. 6679–6687. [68] Hans Hersbach. “Decomposition of the continuous ranked probability score for ensemble prediction systems”. In: Weather and Forecasting 15.5 (2000), pp. 559–570. [69] Robi Polikar. “Ensemble based systems in decision making”. In: IEEE Circuits and systems magazine 6.3 (2006), pp. 21–45. [70] Scott M Lundberg and Su-In Lee. “A unified approach to interpreting model predictions”. In: Advances in neural information processing systems 30 (2017). [71] Hao Luo et al. “DEG 15, an update of the Database of Essential Genes that in cludes built-in analysis tools”. In: Nucleic acids research 49.D1 (2021), pp. D677– D686. [72] Michael I Love, Wolfgang Huber, and Simon Anders. “Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2”. In: Genome biology 15.12 (2014), pp. 1–21. REFERENCES 85 [73] Sanjaya K Panda, Subhrajit Nag, and Prasanta K Jana. “A smoothing based task scheduling algorithm for heterogeneous multi-cloud environment”. In: 2014 In ternational Conference on Parallel, Distributed and Grid Computing. IEEE. 2014, pp. 62–67. [74] SGOPAL Patro and Kishore Kumar Sahu. “Normalization: A preprocessing stage”. In: arXiv preprint arXiv:1503.06462 (2015). [75] C Saranya and G Manikandan. “A study on normalization techniques for pri vacy preserving data mining”. In: International Journal of Engineering and Technology (IJET) 5.3 (2013), pp. 2701–2704. [76] Md Manjurul Ahsan et al. “Effect of data scaling methods on machine learning algorithms and model performance”. In: Technologies 9.3 (2021), p. 52. [77] Ekaba Bisong and Ekaba Bisong. “Introduction to Scikit-learn”. In: Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners (2019), pp. 215–229. [78] VN Ganapathi Raju et al. “Study the influence of normalization/transformation process on the accuracy of supervised classification”. In: 2020 Third Interna tional Conference on Smart Systems and Inventive Technology (ICSSIT). IEEE. 2020, pp. 729–735. [79] Zaneta Swiderska-Chadaj et al. “Impact of rescanning and normalization on convolutional neural network performance in multi-center, whole-slide classifi cation of prostate cancer”. In: Scientific Reports 10.1 (2020), pp. 1–14. [80] Junfang Wu and Chao Li. “Feature selection based on features unit”. In: 2017 4th International Conference on Information Science and Control Engineering (ICISCE). IEEE. 2017, pp. 330–333. [81] Huiqing Liu, Jinyan Li, and Limsoon Wong. “A comparative study on feature selection and classification methods using gene expression profiles and pro teomic patterns”. In: Genome informatics 13 (2002), pp. 51–60. REFERENCES 86 [82] Zena M Hira and Duncan F Gillies. “A review of feature selection and feature extraction methods applied on microarray data”. In: Advances in bioinformatics 2015 (2015). [83] Valeria Fonti and Eduard Belitser. “Feature selection using lasso”. In: VU Ams terdam research paper in business analytics 30 (2017), pp. 1–25. [84] Jason Brownlee. “An introduction to feature selection”. In: Machine learning process 6 (2014). [85] R Muthukrishnan and R Rohini. “LASSO: A feature selection technique in pre dictive modeling for machine learning”. In: 2016 IEEE international conference on advances in computer applications (ICACA). IEEE. 2016, pp. 18–20. [86] Anamika Chauhan et al. “Detection of lung cancer using machine learning tech niques based on routine blood indices”. In: 2020 IEEE international conference for innovation in technology (INOCON). IEEE. 2020, pp. 1–6. [87] Hui Zou and Trevor Hastie. “Regression shrinkage and selection via the elastic net, with applications to microarrays”. In: JR Stat Soc Ser B 67 (2003), pp. 301– 20. [88] Asma Agaal and Mansour Essgaer. “Influence of Feature Selection Methods on Breast Cancer Early Prediction Phase using Classification and Regression Tree”. In: 2022 International Conference on Engineering & MIS (ICEMIS). IEEE. 2022, pp. 1–6. [89] Robert Tibshirani. “Regression shrinkage and selection via the lasso”. In: Jour nal of the Royal Statistical Society: Series B (Methodological) 58.1 (1996), pp. 267–288. [90] Fabian Pedregosa et al. “Scikit-learn: Machine learning in Python”. In: the Jour nal of machine Learning research 12 (2011), pp. 2825–2830. [91] Jerome Friedman, Trevor Hastie, and Rob Tibshirani. “Regularization paths for generalized linear models via coordinate descent”. In: Journal of statistical soft ware 33.1 (2010), p. 1. REFERENCES 87 [92] Artem Sokolov et al. “Pathway-based genomics prediction using generalized elastic net”. In: PLoS computational biology 12.3 (2016), e1004790. [93] Amrita Basu et al. “RWEN: response-weighted elastic net for prediction of chemosensitivity of cancer cell lines”. In: Bioinformatics 34.19 (2018), pp. 3332– 3339. [94] Mahmood Khalsan et al. “A survey of machine learning approaches applied to gene expression analysis for cancer prediction”. In: IEEE Access 10 (2022), pp. 27522–27534. [95] Jose Linares-Blanco, Alejandro Pazos, and Carlos Fernandez-Lozano. “Ma- ˜ chine learning analysis of TCGA cancer data”. In: PeerJ Computer Science 7 (2021), e584. [96] Ahsan Bin Tufail et al. “Deep learning in cancer diagnosis and prognosis pre diction: a minireview on challenges, recent trends, and future directions”. In: Computational and Mathematical Methods in Medicine 2021 (2021). [97] Vabiyana Safira Desdhanty and Zuherman Rustam. “Liver cancer classification using random forest and extreme gradient boosting (xgboost) with genetic algo rithm as feature selection”. In: 2021 International Conference on Decision Aid Sciences and Application (DASA). IEEE. 2021, pp. 716–719. [98] Bong-Hyun Kim, Kijin Yu, and Peter CW Lee. “Cancer classification of single cell gene expression data by neural network”. In: Bioinformatics 36.5 (2020), pp. 1360–1366. [99] Sk Md Mosaddek Hossain et al. “Pan-cancer classification by regularized multi task learning”. In: Scientific reports 11.1 (2021), p. 24252. [100] Yulin Zhang et al. “A novel XGBoost method to identify cancer tissue-of origin based on copy number variations”. In: Frontiers in genetics 11 (2020), p. 585029. [101] Jerome H Friedman. “Stochastic gradient boosting”. In: Computational statis tics & data analysis 38.4 (2002), pp. 367–378. REFERENCES 88 [102] Xiaobo Zhou, Kuang-Yu Liu, and Stephen TC Wong. “Cancer classification and prediction using logistic regression with Bayesian gene selection”. In: Journal of Biomedical Informatics 37.4 (2004), pp. 249–259. [103] Zakariya Yahya Algamal and Muhammad Hisyam Lee. “Penalized logistic re gression with the adaptive LASSO for gene selection in high-dimensional can cer classification”. In: Expert Systems with Applications 42.23 (2015), pp. 9326– 9332. [104] Zakariya Yahya Algamal and Muhammad Hisyam Lee. “Regularized logistic regression with adjusted adaptive elastic net for gene selection in high dimen sional cancer classification”. In: Computers in biology and medicine 67 (2015), pp. 136–145. [105] Lingyun Gao et al. “Hybrid method based on information gain and support vector machine for gene selection in cancer classification”. In: Genomics, pro teomics & bioinformatics 15.6 (2017), pp. 389–395. [106] Ahmed Arafa et al. “Regularized logistic regression model for cancer classifica tion”. In: 2021 38th National Radio Science Conference (NRSC). Vol. 1. IEEE. 2021, pp. 251–261. [107] Trevor Hastie. “Ridge regularization: An essential concept in data science”. In: Technometrics 62.4 (2020), pp. 426–433. [108] Enrique Alba et al. “Gene selection in cancer classification using PSO/SVM and GA/SVM hybrid algorithms”. In: 2007 IEEE congress on evolutionary compu tation. IEEE. 2007, pp. 284–290. [109] Mikel Galar et al. “An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes”. In: Pattern Recognition 44.8 (2011), pp. 1761–1776. [110] Luıs A Vale Silva and Karl Rohr. “Pan-cancer prognosis prediction using multi modal deep learning”. In: 2020 IEEE 17th International Symposium on Biomed ical Imaging (ISBI). IEEE. 2020, pp. 568–571. REFERENCES 89 [111] Niousha Bagheri Khoulenjani et al. “Cancer miRNA biomarkers classification using a new representation algorithm and evolutionary deep learning”. In: Soft Computing 25 (2021), pp. 3113–3129. [112] Thomas Serre, Aude Oliva, and Tomaso Poggio. “A feedforward architecture accounts for rapid categorization”. In: Proceedings of the national academy of sciences 104.15 (2007), pp. 6424–6429. [113] U Ravindran and C Gunavathi. “A survey on gene expression data analysis using deep learning methods for cancer diagnosis”. In: Progress in Biophysics and Molecular Biology 177 (2023), pp. 1–13. [114] Pablo Guillen and Jerry Ebalunode. “Cancer classification based on microarray gene expression data using deep learning”. In: 2016 International Conference on Computational Science and Computational Intelligence (CSCI). IEEE. 2016, pp. 1403–1405. [115] Feng Gao et al. “DeepCC: a novel deep learning-based framework for cancer molecular subtype classification”. In: Oncogenesis 8.9 (2019), p. 44. [116] Bing Xu et al. “Empirical evaluation of rectified activations in convolutional network”. In: arXiv preprint arXiv:1505.00853 (2015). [117] Diederik P Kingma and Jimmy Ba. “Adam: A method for stochastic optimiza tion”. In: arXiv preprint arXiv:1412.6980 (2014). [118] Mohanad Mohammed et al. “A stacking ensemble deep learning approach to cancer type classification based on TCGA data”. In: Scientific reports 11.1 (2021), pp. 1–22. [119] Sergey Ioffe and Christian Szegedy. “Batch normalization: Accelerating deep network training by reducing internal covariate shift”. In: International confer ence on machine learning. pmlr. 2015, pp. 448–456. [120] Nitish Srivastava et al. “Dropout: a simple way to prevent neural networks from overfitting”. In: The journal of machine learning research 15.1 (2014), pp. 1929–1958. REFERENCES 90 [121] Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. “Layer normaliza tion”. In: arXiv preprint arXiv:1607.06450 (2016). [122] Madhuri Gokhale, Sraban Kumar Mohanty, and Aparajita Ojha. “GeneViT: Gene vision transformer with improved DeepInsight for cancer classification”. In: Computers in Biology and Medicine 155 (2023), p. 106643. [123] Anwar Khan and Boreom Lee. “Gene transformer: Transformers for the gene expression-based classification of lung cancer subtypes”. In: arXiv preprint arXiv:2108.11833 (2021). [124] Ting-He Zhang et al. “Transformer for Gene Expression Modeling (T-GEM): An Interpretable Deep Learning Model for Gene Expression-Based Phenotype Predictions”. In: Cancers 14.19 (2022), p. 4763. [125] Faysal Bin Rahman, Farhan Anjum, and Musaddiq Hasan Fatin Khan. “Detec tion of Lung Adenocarcinoma Cancer based on RNA-seq gene expression data using LIMMA and TabNet”. PhD thesis. Department of Computer Science and Engineering (CSE), Islamic University of . . ., 2022. [126] R Tyler McLaughlin et al. “Fast, accurate, and racially unbiased pan-cancer tumor-only variant calling with tabular machine learning”. In: NPJ Precision Oncology 7.1 (2023), p. 4. [127] Ahmad Nasimian et al. “A deep tabular data learning model predicting cisplatin sensitivity identifies BCL2L1 dependency in cancer”. In: Computational and Structural Biotechnology Journal (2023). [128] Yawen Xiao et al. “A deep learning-based multi-model ensemble method for cancer prediction”. In: Computer methods and programs in biomedicine 153 (2018), pp. 1–9. [129] Aik Choon Tan and David Gilbert. “Ensemble machine learning on gene ex pression data for cancer classification”. In: (2003). [130] Eloise Withnell et al. “XOmiVAE: an interpretable deep learning model for can cer classification using high-dimensional omics data”. In: Briefings in Bioinfor matics 22.6 (2021), bbab315. REFERENCES 91 [131] Scott M Lundberg and Su-In Lee. “A Unified Approach to Interpreting Model Predictions”. In: Advances in Neural Information Processing Systems 30. Ed. by I. Guyon et al. Curran Associates, Inc., 2017, pp. 4765–4774. URL: http:// papers.nips.cc/paper/7062-a-unified-approach-to-interpreting model-predictions.pdf. [132] Wilson E Marcılio and Danilo M Eler. “From explanations to feature selection: assessing SHAP values as feature selection mechanism”. In: 2020 33rd SIB GRAPI conference on Graphics, Patterns and Images (SIBGRAPI). Ieee. 2020, pp. 340–347. [133] Masrur Sobhan and Ananda Mohan Mondal. “Explainable Machine Learning to Identify Patient-specific Biomarkers for Lung Cancer”. In: 2022 IEEE Inter national Conference on Bioinformatics and Biomedicine (BIBM). IEEE. 2022, pp. 3152–3159. [134] Katsuya Futagami et al. “Pairwise acquisition prediction with SHAP value in terpretation”. In: The Journal of Finance and Data Science 7 (2021), pp. 22– 44. [135] Melvyn Yap et al. “Verifying explainability of a deep learning tissue classifier trained on RNA-seq data”. In: Scientific reports 11.1 (2021), p. 2641. [136] Michael Chromik. “Making SHAP Rap: Bridging local and global insights through interaction and narratives”. In: Human-Computer Interaction–INTERACT 2021: 18th IFIP TC 13 International Conference, Bari, Italy, August 30–September 3, 2021, Proceedings, Part II 18. Springer. 2021, pp. 641–651. [137] A Stupnikov et al. “Robustness of differential gene expression analysis of RNA seq”. In: Computational and structural biotechnology journal 19 (2021), pp. 3470– 3481. [138] Adam McDermaid et al. “Interpretation of differential gene expression results of RNA-seq data: review and integration”. In: Briefings in bioinformatics 20.6 (2019), pp. 2044–2054. [139] Qingguo Wang et al. “Enabling cross-study analysis of RNA-Sequencing data”. In: BioRxiv (2017), p. 110734. [140] Qingguo Wang et al. “Unifying cancer and normal RNA sequencing data from different sources”. In: Scientific data 5.1 (2018), pp. 1–8. [141] Nick Bunkley. “Joseph Juran, Pioneer in Quality Control, Dies”. In: The New York Times 103 (2008). [142] V Roshan Joseph. “Optimal ratio for data splitting”. In: Statistical Analysis and Data Mining: The ASA Data Science Journal 15.4 (2022), pp. 531–538. [143] Denny Wu and Ji Xu. “On the Optimal Weighted ℓ2 Regularization in Overpa rameterized Linear Regression”. In: Advances in Neural Information Process ing Systems 33 (2020), pp. 10112–10123.	en_US
dc.identifier.uri	http://hdl.handle.net/123456789/2159
dc.description	Supervised by Mr. Tareque Mohmud Chowdhury, Assistant Professor, Mr. Tasnim Ahmed, Lecturer,	en_US
dc.description.abstract	Classifying cancer using gene expression can be an important tool for under standing the specific characteristics of a patient’s cancer and for guiding the most appropriate treatment approach. By identifying the specific genes that are involved in the development and progression of a particular cancer, it may be possible to tailor treatment to target those genes and improve outcomes for the patient. In addition, by understanding the genetic makeup of a patient’s cancer, it may be possible to identify clinical trials or targeted therapies that may be more effective for that patient. Here, in our study, we worked with the TCGA Pan Cancer dataset where we used the RNA-seq data for analyzing the gene expres sions. The dataset comprises 33 types of cancer. Our study mainly focuses on implementing an explainable AI-based panCancer classification approach using gene expression analysis. The goal is to accurately detect the type of cancer in in dividuals within a short time. We employed seven classifier algorithms- Logistic Regression, SVM, XGBoost, Random Forest, MLP, 1-D CNN, and TabNet. To enhance the performance of the models, we utilized feature selection techniques such as Lasso, SelectFromModel, Select-K-Best, and ElasticNet. SelectFrom Model with 500 features yielded the best performance. We applied ensemble methods of probability averaging and max voting, with probability averaging achieving the highest accuracy of 96.60%. Validation of the selected features’ contribution and comparison with gene sets from DESeq2 analysis confirmed their significance and relevance. This approach provides insights into cancer specific molecular mechanisms and pathways. Overall, our study demonstrates the effectiveness of feature selection in reducing dimensionality while maintain ing predictive power and biological relevance	en_US
dc.language.iso	en	en_US
dc.publisher	Department of Computer Science and Engineering(CSE), Islamic University of Technology(IUT), Board Bazar, Gazipur-1704, Bangladesh	en_US
dc.title	An Ensemble Method for Cancer Classification and Identification of Cancer-Specific Genes from Genomic Data	en_US
dc.type	Thesis	en_US