Detection of Lung Adenocarcinoma Cancer based on RNA-seq gene expression data using LIMMA and TabNet

Show simple item record

dc.contributor.author Rahman, Faysal Bin
dc.contributor.author Anjum, Farhan
dc.contributor.author Khan, Musaddiq Hasan Fatin
dc.date.accessioned 2023-04-28T06:39:57Z
dc.date.available 2023-04-28T06:39:57Z
dc.date.issued 2022-05-30
dc.identifier.citation [1] J. M. Scholey, I. Brust-Mascher, and A. Mogilner, “Cell division,” Nature, vol. 422, no. 6933, pp. 746–752, 2003. [2] R. S. Hotchkiss, A. Strasser, J. E. McDunn, and P. E. Swanson, “Cell death,” New England Journal of Medicine, vol. 361, no. 16, pp. 1570–1583, 2009. [3] T. R. Geiger and D. S. Peeper, “Metastasis mechanisms,” Biochimica et Biophysica Acta (BBA)-Reviews on Cancer, vol. 1796, no. 2, pp. 293–308, 2009. [4] M. Yan and P. Jurasz, “The role of platelets in the tumor microenvironment: from solid tumors to leukemia,” Biochimica et Biophysica Acta (BBA)- Molecular Cell Research, vol. 1863, no. 3, pp. 392–400, 2016. [5] M. A. Furlong, J. C. Fanburg-Smith, and M. Miettinen, “The morphologic spectrum of hibernoma: a clinicopathologic study of 170 cases,” The American journal of surgical pathology, vol. 25, no. 6, pp. 809–814, 2001. [6] P. A. Futreal, L. Coin, M. Marshall, T. Down, T. Hubbard, R. Wooster, N. Rahman, and M. R. Stratton, “A census of human cancer genes,” Nature reviews cancer, vol. 4, no. 3, pp. 177–183, 2004. [7] F. Bray, J. Ferlay, I. Soerjomataram, R. L. Siegel, L. A. Torre, and A. Jemal, “Global cancer statistics 2018: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries,” CA: a cancer journal for clinicians, vol. 68, no. 6, pp. 394–424, 2018. [8] “Global cancer observatory: Cancer today. international agency for research on cancer. lyon, france.” https://gco.iarc.fr/today. Accessed: 2022-04-21. [9] W. D. Travis, T. Colby, B. Corrin, Y. Shimosato, and E. Brambilla, Histological typing of lung and pleural tumours. Springer Science & Business Media, 2012. 26 REFERENCES [10] W. D. Travis, L. B. Travis, and S. S. Devesa, “Lung cancer,” Cancer, vol. 75, no. S1, pp. 191–202, 1995. [11] V. S. Gomase and S. Tagore, “Transcriptomics,” Current drug metabolism, vol. 9, no. 3, pp. 245–249, 2008. [12] R. Govindarajan, J. Duraiyan, K. Kaliyappan, and M. Palanisamy, “Microarray and its applications,” Journal of pharmacy & bioallied sciences, vol. 4, no. Suppl 2, p. S310, 2012. [13] K. Lindblad-Toh, D. M. Tanenbaum, M. J. Daly, E. Winchester, W.-O. Lui, A. Villapakkam, S. E. Stanton, C. Larsson, T. J. Hudson, B. E. Johnson, et al., “Loss-of-heterozygosity analysis of small-cell lung carcinomas using singlenucleotide polymorphism arrays,” Nature biotechnology, vol. 18, no. 9, pp. 1001– 1005, 2000. [14] J. R. Pollack, C. M. Perou, A. A. Alizadeh, M. B. Eisen, A. Pergamenschikov, C. F. Williams, S. S. Jeffrey, D. Botstein, and P. O. Brown, “Genome-wide analysis of dna copy-number changes using cdna microarrays,” Nature genetics, vol. 23, no. 1, pp. 41–46, 1999. [15] J. R. Pollack, T. Sørlie, C. M. Perou, C. A. Rees, S. S. Jeffrey, P. E. Lonning, R. Tibshirani, D. Botstein, A.-L. Børresen-Dale, and P. O. Brown, “Microarray analysis reveals a major direct role of dna copy number alteration in the transcriptional program of human breast tumors,” Proceedings of the National Academy of Sciences, vol. 99, no. 20, pp. 12963–12968, 2002. [16] S. Solinas-Toldo, S. Lampel, S. Stilgenbauer, J. Nickolenko, A. Benner, H. Döhner, T. Cremer, and P. Lichter, “Matrix-based comparative genomic hybridization: biochips to screen for genomic imbalances,” Genes, chromosomes and cancer, vol. 20, no. 4, pp. 399–407, 1997. 27 REFERENCES [17] J. A. Veltman, J. Fridlyand, S. Pejavar, A. B. Olshen, J. E. Korkola, S. DeVries, P. Carroll, W.-L. Kuo, D. Pinkel, D. Albertson, et al., “Array-based comparative genomic hybridization for genome-wide screening of dna copy number in bladder tumors,” Cancer research, vol. 63, no. 11, pp. 2872–2880, 2003. [18] A. M. Snijders, M. E. Nowee, J. Fridlyand, J. M. Piek, J. C. Dorsman, A. N. Jain, D. Pinkel, P. J. Van Diest, R. H. Verheijen, and D. G. Albertson, “Genomewide-array-based comparative genomic hybridization reveals genetic homogeneity and frequent copy number increases encompassing ccne1 in fallopian tube carcinoma,” Oncogene, vol. 22, no. 27, pp. 4281–4286, 2003. [19] M. M. Weiss, A. M. Snijders, E. J. Kuipers, B. Ylstra, D. Pinkel, S. G. Meuwissen, P. J. van Diest, D. G. Albertson, and G. A. Meijer, “Determination of amplicon boundaries at 20q13. 2 in tissue samples of human gastric adenocarcinomas by high-resolution microarray comparative genomic hybridization,” The Journal of Pathology: A Journal of the Pathological Society of Great Britain and Ireland, vol. 200, no. 3, pp. 320–326, 2003. [20] P. G. Buckley, K. K. Mantripragada, M. Benetkiewicz, I. Tapia-Páez, T. Diaz de Ståhl, M. Rosenquist, H. Ali, C. Jarbo, C. De Bustos, C. Hirvelä, et al., “A full-coverage, high-resolution human chromosome 22 genomic microarray for clinical and research applications,” Human Molecular Genetics, vol. 11, no. 25, pp. 3221–3229, 2002. [21] J. A. Martinez-Climent, A. A. Alizadeh, R. Segraves, D. Blesa, F. RubioMoscardo, D. G. Albertson, J. Garcia-Conde, M. J. Dyer, R. Levy, D. Pinkel, et al., “Transformation of follicular lymphoma to diffuse large cell lymphoma is associated with a heterogeneous set of dna copy number and gene expression alterations,” Blood, The Journal of the American Society of Hematology, vol. 101, no. 8, pp. 3109–3117, 2003. 28 REFERENCES [22] B. E. Howard, Q. Hu, A. C. Babaoglu, M. Chandra, M. Borghi, X. Tan, L. He, H. Winter-Sederoff, W. Gassmann, P. Veronese, et al., “High-throughput rna sequencing of pseudomonas-infected arabidopsis reveals hidden transcriptome complexity and novel splice variants,” PLoS One, vol. 8, no. 10, p. e74183, 2013. [23] K. B. Arnvig, I. Comas, N. R. Thomson, J. Houghton, H. I. Boshoff, N. J. Croucher, G. Rose, T. T. Perkins, J. Parkhill, G. Dougan, et al., “Sequencebased analysis uncovers an abundance of non-coding rna in the total transcriptome of mycobacterium tuberculosis,” PLoS pathogens, vol. 7, no. 11, p. e1002342, 2011. [24] C. A. Maher, C. Kumar-Sinha, X. Cao, S. Kalyana-Sundaram, B. Han, X. Jing, L. Sam, T. Barrette, N. Palanisamy, and A. M. Chinnaiyan, “Transcriptome sequencing to detect gene fusions in cancer,” Nature, vol. 458, no. 7234, pp. 97– 101, 2009. [25] A. Roberts, L. Schaeffer, and L. Pachter, “Updating rna-seq analyses after reannotation,” Bioinformatics, vol. 29, no. 13, pp. 1631–1637, 2013. [26] T. S. Furey, N. Cristianini, N. Duffy, D. W. Bednarski, M. Schummer, and D. Haussler, “Support vector machine classification and validation of cancer tissue samples using microarray expression data,” Bioinformatics, vol. 16, no. 10, pp. 906–914, 2000. [27] L. Li, C. R. Weinberg, T. A. Darden, and L. G. Pedersen, “Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the ga/knn method,” Bioinformatics, vol. 17, no. 12, pp. 1131–1142, 2001. [28] D. V. Nguyen and D. M. Rocke, “Classification of acute leukemia based on dna microarray gene expressions using partial least squares,” in Methods of Microarray Data Analysis, pp. 109–124, Springer, 2002. 29 REFERENCES [29] D. V. Nguyen and D. M. Rocke, “Tumor classification by partial least squares using microarray gene expression data,” Bioinformatics, vol. 18, no. 1, pp. 39– 50, 2002. [30] J. Dev, S. K. Dash, S. Dash, and M. Swain, “A classification technique for microarray gene expression data using pso-flann,” International Journal on Computer Science and Engineering, vol. 4, no. 9, p. 1534, 2012. [31] A. Castaño, F. Fernández-Navarro, C. Hervás-Martínez, and P. A. Gutiérrez, “Neuro-logistic models based on evolutionary generalized radial basis function for the microarray gene expression classification problem,” Neural processing letters, vol. 34, no. 2, pp. 117–131, 2011. [32] S. Student and K. Fujarewicz, “Stable feature selection and classification algorithms for multiclass microarray data,” Biology direct, vol. 7, no. 1, pp. 1–20, 2012. [33] A. Sharma and K. K. Paliwal, “A gene selection algorithm using bayesian classification approach,” American Journal of Applied Sciences, vol. 9, no. 1, pp. 127– 131, 2012. [34] Y. Xiao, J. Wu, Z. Lin, and X. Zhao, “A deep learning-based multi-model ensemble method for cancer prediction,” Computer methods and programs in biomedicine, vol. 153, pp. 1–9, 2018. [35] R. C. Gentleman, V. J. Carey, D. M. Bates, B. Bolstad, M. Dettling, S. Dudoit, B. Ellis, L. Gautier, Y. Ge, J. Gentry, et al., “Bioconductor: open software development for computational biology and bioinformatics,” Genome biology, vol. 5, no. 10, pp. 1–16, 2004. [36] M. I. Love, W. Huber, and S. Anders, “Moderated estimation of fold change and dispersion for rna-seq data with deseq2,” Genome biology, vol. 15, no. 12, pp. 1–21, 2014. 30 REFERENCES [37] M. E. Ritchie, B. Phipson, D. Wu, Y. Hu, C. W. Law, W. Shi, and G. K. Smyth, “limma powers differential expression analyses for rna-sequencing and microarray studies,” Nucleic acids research, vol. 43, no. 7, pp. e47–e47, 2015. [38] Y. Tong, “The comparison of limma and deseq2 in gene analysis,” E3S Web of Conferences, vol. 271, p. 03058, 01 2021. [39] H. Abdi and L. J. Williams, “Principal component analysis,” Wiley interdisciplinary reviews: computational statistics, vol. 2, no. 4, pp. 433–459, 2010. [40] S. Balakrishnama and A. Ganapathiraju, “Linear discriminant analysis-a brief tutorial,” Institute for Signal and information Processing, vol. 18, no. 1998, pp. 1–8, 1998. [41] H. Yu and J. Yang, “A direct lda algorithm for high-dimensional data—with application to face recognition,” Pattern recognition, vol. 34, no. 10, pp. 2067– 2070, 2001. [42] J. Yang and J.-y. Yang, “Why can lda be performed in pca transformed space?,” Pattern recognition, vol. 36, no. 2, pp. 563–566, 2003. [43] J. Hestness, S. Narang, N. Ardalani, G. Diamos, H. Jun, H. Kianinejad, M. Patwary, M. Ali, Y. Yang, and Y. Zhou, “Deep learning scaling is predictable, empirically,” arXiv preprint arXiv:1712.00409, 2017. [44] S. O. Arık and T. Pfister, “Tabnet: Attentive interpretable tabular learning,” in AAAI, vol. 35, pp. 6679–6687, 2021. [45] J. Yan, T. Xu, Y. Yu, and H. Xu, “Rainfall forecast model based on the tabnet model,” Water, vol. 13, no. 9, p. 1272, 2021. [46] M. Grbovic and H. Cheng, “Real-time personalization using embeddings for search ranking at airbnb,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 311–320, 2018. 31 REFERENCES [47] D. A. Hudson and C. D. Manning, “Compositional attention networks for machine reasoning,” arXiv preprint arXiv:1803.03067, 2018. [48] A. Mott, D. Zoran, M. Chrzanowski, D. Wierstra, and D. J. Rezende, “S3ta: A soft, spatial, sequential, top-down attention model,” 2018. [49] A. Martins and R. Astudillo, “From softmax to sparsemax: A sparse model of attention and multi-label classification,” in International conference on machine learning, pp. 1614–1623, PMLR, 2016. en_US
dc.identifier.uri http://hdl.handle.net/123456789/1865
dc.description Supervised by Mr. Tareque Mohmud Chowdhury, Assistant Professor, Co-Supervisor, Tasnim Ahmed, Lecturer, Department of Computer Science and Engineering(CSE), Islamic University of Technology (IUT) Board Bazar, Gazipur-1704, Bangladesh. This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2022. en_US
dc.description.abstract Lung cancer is one of the deadliest diseases of the world to this date with the highest mortality rate amidst all other forms of cancer. Detection of cancer in early stages is crucial for cancer treatment. Progress in cancer detection has been increasingly made based on gene expression levels, giving insight into making correct and successful treatment decisions, thanks to recent advances in high-throughput sequencing technology such as RNA-seq and the use of several machine learning approaches. However, most of the work on cancer detection uses micro-array data and machine learning models. This paper presents a new methodology based on RNA-seq data which is better at detecting transcripts than micro-array along with Deep Neural Network (Tabnet) to classify human lung cancer. en_US
dc.language.iso en en_US
dc.publisher Department of Computer Science and Engineering(CSE), Islamic University of Technology(IUT), Board Bazar, Gazipur, Bangladesh en_US
dc.subject Gene Expression, RNAseq, Cancer Detection, TabNet, Limma en_US
dc.title Detection of Lung Adenocarcinoma Cancer based on RNA-seq gene expression data using LIMMA and TabNet en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search IUT Repository


Advanced Search

Browse

My Account

Statistics