Interpretable machine learning for origin classification of brazilian soybeans: A random forest and XAI-based approach

Khushboo Soni; Russell Frew; Biniam Kebede

doi:10.1016/j.jfca.2026.108896

Back

Interpretable machine learning for origin classification of brazilian soybeans: A random forest and XAI-based approach

Journal article

Open access

Peer reviewed

Interpretable machine learning for origin classification of brazilian soybeans: A random forest and XAI-based approach

Khushboo Soni, Russell Frew and Biniam Kebede

Journal of food composition and analysis, Vol.150, 108896

12/01/2026

DOI: https://doi.org/10.1016/j.jfca.2026.108896

Handle:

https://hdl.handle.net/10523/49513

Abstract

Brazil

Geographical Origin

Machine Learning

Random Forest

SHAP

Soybean

XAI

eXplainable Artificial Intelligence (XAI) offers a powerful framework for enhancing the transparency and trustworthiness of machine learning models in highly regulated fields such as food traceability. In this study, we applied XAI techniques, SHAP (SHapley Additive exPlanations) and Partial Dependence Plots (PDPs), to interpret a Random Forest (RF) classification model developed to determine the geographical origin of Brazilian soybean samples. A total of 60 samples, representing two biomes and six states, were analysed using stable isotope ratios (δ13C, δ15N, δ2H, δ18O, δ34S) and elemental composition (41 elements). The RF model achieved high classification accuracy at both the biome and state levels using the fused stable isotopes and elemental composition datasets. XAI tools revealed δ18O, δ2H, Rb, Cs, and Ca as the most influential features, with δ18O consistently emerging as the dominant predictor. SHAP beeswarm and waterfall plots provided global and local explanations of feature importance, while PDPs and two-way PDPs captured non-linear relationships and synergistic effects between isotopic and elemental variables. These findings confirm the discriminative power of geochemical markers and show the practical value of interpretable models for agroecological traceability and regulatory compliance. This approach advances XAI in food provenance, providing a transparent, region-specific framework that supports sustainability initiatives. • RF model classified soybeans by biome and state of Brazil using geochemical data. • First XAI study using SI and EC data for soybean traceability in Brazil. • XAI revealed non-linear interactions and enhanced model interpretability. • SHAP and PDPs identified δ18O, δ²H, Rb, Cs, and Ca as key origin markers. • δ18O was the most consistent and influential feature in all classification tasks.

Files and links (2)

pdf

1-s2.0-S0889157526000396-main10.65 MBDownload View

Published (Version of record)CC BY V4.0, Open Access

url

https://doi.org/10.1016/j.jfca.2026.108896View

Published (Version of record) Open

Metrics

3 Record Views

Details

Record Identifier: 9926833810001891
Title: Interpretable machine learning for origin classification of brazilian soybeans: A random forest and XAI-based approach
Creators: Khushboo Soni
Russell Frew
Biniam Kebede
Publication Details: Journal of food composition and analysis, Vol.150, 108896
Academic Unit: Food Science
Publisher: Elsevier
Date published ; e-published: 12/01/2026
Copyright: Copyright © The Author(s) 2025. This work was first published in Journal of Food Composition and Analysis (Elsevier). This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://www.creativecommons.org/licenses/by/4.0/), which permits unrestricted use, sharing, adaptation, distribution and reproduction in any medium or format, provided that the original work is properly attributed to the creator(s) and the source, a link to the Creative Commons license is provided, and any changes made are indicated
Language: English
Resource Type ; Subtype: Journal article