Abstract
Feature selection is an important process within machine learning problems. Through pressures imposed on models during evolution, genetic programming performs basic feature selection, and so analysis of the evolved models can provide some insights into the utility of input features. Previous work has tended towards a presence model of feature selection, where the frequency of a feature appearing within evolved models is a metric for its utility. In this paper, we identify some drawbacks with using this approach, and instead propose the integration of importance measures for feature selection that measure the influence of a feature within a model. Using sensitivity-like analysis methods inspired by importance measures used in random forest regression, we demonstrate that genetic programming introduces many features into evolved models that have little impact on a given model's behaviour, and this can mask the true importance of salient features. The paper concludes by exploring bloat control methods and adaptive terminal selection methods to influence the identification of useful features within the search performed by genetic programming, with results suggesting that a combination of adaptive terminal selection and bloat control may help to improve generalisation performance.