Evaluating LLM Alignment with Human Trust Models

Anushka Debnath; Stephen Cranefield; Bastin Tony Roy Savarimuthu; Emiliano Lorini

doi:10.5220/0014448300004052

Back

Evaluating LLM Alignment with Human Trust Models

Conference proceeding

Open access

Evaluating LLM Alignment with Human Trust Models

Anushka Debnath, Stephen Cranefield, Bastin Tony Roy Savarimuthu and Emiliano Lorini

Proceedings of the 18th International Conference on Agents and Artificial Intelligence (ICAART), Vol.1, pp.575-583

International Conference on Agents and Artificial Intelligence (ICAART), 18th (Marbella, Spain, 05/03/2026–08/03/2026)

03/2026

DOI: https://doi.org/10.5220/0014448300004052

Handle:

https://hdl.handle.net/10523/50762

Abstract

Agents

Contrastive Prompting

Large Language Models

Trust Representation

Trust plays a pivotal role in enabling effective cooperation, reducing uncertainty, and guiding decision-making in both human interactions and multi-agent systems. Although it is significant, there is limited understanding of how large language models (LLMs) internally conceptualize and reason about trust. This work presents a white-box analysis of trust representation in EleutherAI/gpt-j-6B, using contrastive prompting to generate embedding vectors within the activation space of the LLM for diadic trust and related interpersonal relationship attributes. We first identified trust-related concepts from five established human trust models. We then determined a threshold for significant conceptual alignment by computing pairwise cosine similarities across 60 general emotional concepts. Then we measured the cosine similarities between the LLM’s internal representation of trust and the derived trust-related concepts. Our results show that the internal trust representation of EleutherAI/gp t-j-6B aligns most closely with the Castelfranchi socio-cognitive model, followed by the Marsh Model. These findings indicate that LLMs encode socio-cognitive constructs in their activation space in ways that support meaningful comparative analyses, inform theories of social cognition, and support the design of human–AI collaborative systems.

Files and links (2)

pdf

144483627.04 kBDownload View

Published (Version of record) Open Access CC BY-NC-ND V4.0

url

https://doi.org/10.5220/0014448300004052View

Published (Version of record) Publisher requires login to access openly licensed work Restricted CC BY-NC-ND V4.0

Metrics

2 Record Views

Details

Record Identifier: 9926863667001891
Title: Evaluating LLM Alignment with Human Trust Models
Creators: Anushka Debnath
Stephen Cranefield
Bastin Tony Roy Savarimuthu
Emiliano Lorini
Academic Unit: School of Computing
Publication Details: Proceedings of the 18th International Conference on Agents and Artificial Intelligence (ICAART), Vol.1, pp.575-583
Publisher: SciTePress
Date published ; e-published: 03/2026
Conference: International Conference on Agents and Artificial Intelligence (ICAART), 18th (Marbella, Spain, 05/03/2026–08/03/2026)
Copyright: Copyright © SciTePress 2026. This work was first published in Proceedings of the 18th International Conference on Agents and Artificial Intelligence (SciTePress). This is an open access work distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License (https://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial use, distribution and reproduction in any medium, provided the original work is properly attributed to the creator(s) and the source, is not altered, transformed, or built upon in any way, and a link to the Creative Commons license is provided.
Language: English
Resource Type ; Subtype: Conference proceeding; Conference Paper

Evaluating LLM Alignment with Human Trust Models

Abstract

Files and links (2)

Related content

Metrics

Details