Abstract
We present an adaptive PBIL (Population-Based Incremental Learning) algorithm for feature selection in leukemia gene expression data. The proposed adaptive strategy aimed to improve learning rates within the PBIL framework while reducing feature count. Among the tested methods, APBIL-GP (Adaptive PBIL with Gradient-Proportional learning rate adjustment) demonstrated superior performance by achieving the highest Separability Index (SI) value (0.9244) and effectively reducing the feature count down to 3.9%. The selected features led to improved classification performance, particularly with the Support Vector Machine (SVM) and Random Forest (RF) classifiers. t-SNE visualizations validated the efficacy of APBIL-GP-selected features, showing clear boundaries between leukemia subtypes. Further analysis using Jaccard indices and extensive cross-validation confirmed that APBIL-GP explored unique features, reduced redundancy, and achieved the highest mean accuracy (0.9078), significantly outperforming Boruta, RF, and Chi-squared (Chi2). These findings suggest that APBIL-GP is a robust method for feature selection in gene expression data.