Please use this identifier to cite or link to this item:
Title: Progressive data mining: An Exploration of using whole-dataset feature selection in building classifiers on three biological problems
Keywords: Progressive data mining, Hill and Greedy-Hill algorithms,Yeast functional studies, Microarray data, Protein binding sites,Micro-environment properties
Issue Date: 24-Jan-2008
Citation: VIJAYARAGHAVA SESHADRI SUNDARARAJAN (2008-01-24). Progressive data mining: An Exploration of using whole-dataset feature selection in building classifiers on three biological problems. ScholarBank@NUS Repository.
Abstract: MOTIVATION: Building efficient classification model using limited data is a challenging problem. Each microarray experiment provides information about the behavior of possibly a large number of genes, but only within the specific experimental setup. Proteins perform their function in cells by interacting with other molecules. Thus, determining their binding environments is very important. Previously structural properties are used to predict calcium-binding sites and microarray for genes functions. This implies that generation of good classification models may not be feasible with limited biological data. PROBLEM DEFINITION: Previously the whole set of these features without investigating the issue of the optimal choice of feature combinations or the combination of functional groups of features. In view of this we address a research problem that develops specific method of optimized feature selection and illustrates the results on three specific problems CONTRIBUTION: 1. We proposed b Hill-climbing algorithmb and b Greedy-Hill climbing algorithmb to select features to enhance performance of classification models. 2. We demonstrate by the comparison results of different methods used that the conventional that perform poorer to those based on the Hill and Greedy-Hill feature selection methods. 3. We also demonstrate that the progressive data mining concept improves performance of generated classifiers. We demonstrated a better classification performance (by eight evaluation metrics) by Hill-based on three biological problems.
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
annai.pdf950.67 kBAdobe PDF



Page view(s)

checked on Dec 30, 2018


checked on Dec 30, 2018

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.