dc.identifier.citation |
[1] American Cancer Society. Breast Cancer Facts& Figures 2005-2006. Atlanta: American Cancer Society, Inc. (http://www.cancer.org/). [2] Surveillance, Epidemiology, and End Results (SEER) Program (www.seer.cancer.gov) Public-Use Data (1973-2002), National Cancer Institute, DCCPS, Surveillance Research Program, Cancer Statistics Branch, released April 2005, based on the November 2004 submission. [3] Ian H. Witten and Eibe Frank. Data Mining: Practical machine learning tools and techniques, 2nd Edition. San Fransisco:Morgan Kaufmann;2005 [4]. Jyoti Soni, Ujma Ansari, Dipesh Sharma, Sunita Soni “Predictive Data Mining for Medical Diagnosis: An Overview of Heart Disease Prediction” IJCSE Vol. 3 No. 6 June 2011 [5] D. Delen, G. Walker and A. Kadam (2005), Predicting breast cancer survivability: a comparison of three data mining methods, Artificial Intelligence in Medicine. [6] A.Bellachia and E.Guvan,“Predicting breast cancer survivability using data mining techniques”, Scientific Data Mining Workshop, inconjunction with the 2006 SIAM Conference on Data Mining,2006 [7] Ian H. Witten and Eibe Frank. Data Mining: Practical machine learning tools and techniques, 2nd Edition. San Fransisco:Morgan Kaufmann;2005. [8] American Cancer Society. Breast Cancer Facts& Figures 2005-2006. Atlanta: American Cancer Society, Inc. (http://www.cancer.org/). [9] J. R. Quinlan, C4.5: Programs for Machine Learning. San Mateo, CA:Morgan Kaufmann; 1993. [10] P.-N. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining. Reading, MA: Addison-Wesley, 2005. [11] Razavi, A. R., Gill, H., Ahlfeldt, H., and Shahsavar, N., Predicting metastasis in breast cancer: comparing a decision tree with domain,2011 [10]. V. Chauraisa and S. Pal, “Early Prediction of Heart Diseases Using Data Mining Techniques”, Carib.j.SciTech,Vol.1, pp. 208-217, 2013 [11] Weka: Data Mining Software in Java, |
en_US |
dc.description.abstract |
Breast cancer is one of the major causes of death in women when compared to all other cancers. Breast cancer has become the most hazardous types of cancer among women in the world. In this paper we present an analysis of the prediction of survivability rate of breast cancer patients using data mining techniques. The collection of large volumes of medical data has offered an opportunity to develop prediction models for survival by the medical research community. The data used is the SEER Public-Use Data. The preprocessed data set consists of 262,423 records, which have all the available72 fields from the SEER database. After cleaning of the data set, 106,237 records were put under analysis, then we have investigated five data mining techniques: the Naïve Bayes, the back-propagated neural network, logistic regression, support vector machine and the J48 decision tree algorithms. Comparison of the performance of all these different techniques shows that the Logistic regression has a better performance of 93.07% accuracy. Afterwards, three feature reduction methods, Attribute correlation, Information gain and factor analysis for mixed dataset, were used to reduce data dimension. The result of these methods showed a better performance in time for all the above mentioned data mining techniques. It had a fluctuating accuracy in case of other methods but showed and increase to 94.35% accuracy in case of Logistic regression when factor analysis for mixed dataset was used. |
en_US |