Abstract:
Microarray technology development have meant that the dimensionality of data that is produced by the Microarray chips have increased many folds over the years. Pattern recognition and other subsequent analysis from the thousands of gene expression values is particularly difficult and primary role of an effective feature selection is to simplify this task. Removal of less informative genes helps to alleviate the effects of noise and redundancy, and simplifies the task of disease classification and prediction of medical conditions such as cancer. In this study the shortcoming of the current PSO based approach for feature selection has been improved. A boosted filter and wrapper models are put to use to take advantage of the facilities that each provides. As filter method exhibits some limitations, in this study a boosted approach to filtering (BFSS) has been employed. BFSS iteratively selects genes in each iteration and emphasizes on the misclassified samples and in subsequent iterations it tries to find effective genes for the misclassified samples. This allows BFSS to perform better than traditional Filter methods as it focuses on its weakness-es. Traditional PSO based methods and other similar approaches suffer primarily from over fit-ting problem and the initial population is large and random. The gene subset provided by BFSS is fed to a Particle Swarm Optimizer (PSO) which reduces the feature subset in smaller numbers at each iteration. This helps to generate a better optimal subset of genes. The proposed hybrid approach is applied on leukemia, colon and lung cancer benchmarked datasets and have shown better results than other well-known approaches.
Description:
Supervised by
Prof. Dr. M. A. Mottalib,
Computer Science and Engineering (CSE),
Islamic University of Technology (IUT),
Board Bazar, Gazipur-1704. Bangladesh.