Abstract:
Breast cancer is one of the major causes of death in women when compared to all other cancers. Breast cancer has become the most hazardous types of cancer among women in the world. In this paper we present an analysis of the prediction of survivability rate of breast cancer patients using data mining techniques. The collection of large volumes of medical data has offered an opportunity to develop prediction models for survival by the medical research community. The data used is the SEER Public-Use Data. The preprocessed data set consists of 262,423 records, which have all the available72 fields from the SEER database. After cleaning of the data set, 106,237 records were put under analysis, then we have investigated five data mining techniques: the Naïve Bayes, the back-propagated neural network, logistic regression, support vector machine and the J48 decision tree algorithms. Comparison of the performance of all these different techniques shows that the Logistic regression has a better performance of 93.07% accuracy. Afterwards, three feature reduction methods, Attribute correlation, Information gain and factor analysis for mixed dataset, were used to reduce data dimension. The result of these methods showed a better performance in time for all the above mentioned data mining techniques. It had a fluctuating accuracy in case of other methods but showed and increase to 94.35% accuracy in case of Logistic regression when factor analysis for mixed dataset was used.
Description:
Supervised by
Tareque Mohmud Chowdhury
Assistant professor,
Department of Computer Science and Engineering (CSE),
Islamic University of Technology (IUT),
Board Bazar, Gazipur-1704, Bangladesh.