Abstract
Accurate assessment of biochemical compositions in fish products is essential for quality control in the seafood industry and nutritional research. While spectroscopic techniques enable non-destructive analysis, each method has limitations in prediction accuracy and reliability. Multi-modal data fusion offers a promising solution, but developing robust fusion strategies remains challenging due to complex relationships between spectral features and biochemical properties. This paper presents GP-Fusion, a genetic programming-based high-level fusion method that integrates multiple spectroscopic modalities. Unlike conventional approaches, GPFusion evolves interpretable fusion functions to optimize predictions from diverse spectroscopy work-flows. A key innovation is the replicate variance penalty, which enhances prediction consistency across replicate measurements by capturing within-sample variability and mitigating batch effects. Experimental evaluations on three biochemical targets, including Omega-3, Omega-6, and monounsaturated fatty acids, show that GPFusion improves the coefficient of determination by 6.9%, 8.0%, and 2.6%, respectively. Compared with other high-level fusion strategies, GPFusion delivers more stable predictions with lower variance while maintaining competitive accuracy. Additional empirical studies confirms the effectiveness of the replicate variance penalty and reveal critical trade-offs between tree depth and terminal flexibility for evolving compact and interpretable fusion functions.