Tuesday, June 21, 2011

Schaffernicht et al. : Weighted Mutual Information for Feature Selection


Evening of first day of ICANN conference i.e. 14th of June was reserved for the poster presentations. Out of papers accepted for publication in ICANN’11 proceedings; around 50 of them were provided with the opportunity to present their work orally while remaining (around 60) were provided with the opportunity to present their work as posters. Poster session was held in T-Building of Computer Science department in Aalto University of Science while the regular conference took place in Dipoli Congress Center. There was enthusiastic participation of both poster presenters and other regular attendants of the conference. It was heartening to find that some of the oral presenters were also presenting their work in the form of posters. It made some sense because poster sessions provides one of the best opportunities to discuss own research with fellow researchers and renowned professors in the field which provides new perspective and sometimes new dimension to research.

Out of many posters; I decided to write about this poster by Erik Schaffernicht and Horst-Michael Gross; the topic of which was Weighted Mutual Information for Feature Selection. There is no denying the importance of feature selection in learning algorithms. Hence, the research area is never saturated in this field and have opening for new ideas and methods. In this paper; the authors provide a simple trick to include only the relevant features and at the same time avoid redundant features. Similar to other wrapper methods; they individually train the classifier on the entire features. However, they determine the next feature to be included by their accuracy on the misclassified samples rather than the entire data samples. They provide weights to the samples and select the features which maximize the mutual information.

This idea is similar to the well known AdaBoost algorithm. The misclassified samples in the first round are given higher weights in the second round. This makes some sense because correctly classified samples are easily explained by the selected subset of features at any time instant and the crux of the problem is to find those features that better classify the misclassified samples. They experimented this methods in different datasets from UCI machine learning repository and also on some artificial datasets. Although it was not mentioned in the paper or displayed in poster but I enquired with them if they are using that in some real world datasets for current project and they informed me that they had deployed that in a control systems where data dimension is in thousands.

Overall, a simple but very effective and intelligent trick to select features. It achieves computational efficiency by reducing training cycles significantly and also selects the set of best discriminating features.

No comments:

Post a Comment