Abstract
Analyzing spectral data is an effective and efficient way to estimate the nutrition of food products. However, spectral data are usually high-dimensional and noisy, which makes spectral data analysis challenging. Data augmentation is an effective method to enhance machine learning models in analyzing spectral data. However, most existing data augmentation methods are designed by human experts, which is tedious and requires extensive expertise. To improve the effectiveness of data augmentation and reduce the dependency on domain knowledge, this paper proposes a genetic programming method to design data augmentation methods automatically. The proposed genetic programming method mimics the real distribution and produces new data instances by adding offsets to the original data. We take a fish spectroscopic dataset as an example to verify the effectiveness of the proposed method. The empirical results show that the data augmentation method designed by genetic programming has a very competitive performance compared to the state-of-the-art manually designed ones and provides good interpretability.
Poster presentation.