Abstract
Ensuring the sustainable sourcing of soybeans, as mandated by the European Union Deforestation Regulation (EUDR), requires high spatial resolution to trace soybeans back to their origin. Addressing this challenge necessitates integrating multiple analytical approaches, making data fusion a powerful solution. As global soybean demand nearly doubled over the past decade, the industry faces pressing issues like food fraud, deforestation, and climate change. This study evaluates four data fusion strategies—Low-level, Mid-Principal Component Analysis-Random Forest (PCA-RF), Mid-Uniform Manifold Approximation and Projection-Random Forest (UMAP-RF), and High-level fusion—using data from 60 soybean samples from six Brazilian states. Analytical techniques, including stable isotope analysis, elemental profiling, and volatile organic compound characterisation, were employed. High-level data fusion achieved 100 % classification accuracy for the test set, with Mid-UMAP-RF closely following at 99 %, demonstrating data fusion's role in improving traceability and ensuring sustainable agricultural practices.
• This study ensures soybean traceability with high resolution, aligning with EUDR.
• Sixty soybean samples from six Brazilian states were classified in this study.
• Stable isotope, elemental, and volatile data were used for data fusion.
• Three fusion levels were tested: low, mid (UMAP-RF & PCA-RF), & high-level fusion.
• High-level fusion achieved 100 % accuracy, outperforming mid-UMAP-RF at 99.3 %.