Functional Theory of Mind Evaluation in Large Language Models: A Behavioral and Causal Stability Framework

Notification

Announcement!

ISJEM Invites papers for various areas like engineering, Management, Science & other multi discplinary subjects. Please submit your paper for review.

ISJEM assigns a digital object identifier (DOI) to each published paper, making it easier for the paper to be cited in various major databases like Google Scholar, ResearchGate, Academia.edu, etc…

ISJEM takes 24–48 hours to publish a research paper. Within 24 hours, the submitted paper will be reviewed and notified of its status, and it will be published once the processing fee is successfully received.

Functional Theory of Mind Evaluation in Large Language Models: A Behavioral and Causal Stability Framework

Version

File Size 444.10 KB

Downloads 29

Files 1

Published 31 March 2026

Updated 31 March 2026

Functional Theory of Mind Evaluation in Large Language Models: A Behavioral and Causal Stability Framework

Prashanta Kumar Mohanty, Anupam Prasad, Abhisek Soy, Gaurav Kumar
printfpk@gmail.com
Department of MCA (Batch: 2024–2026), Haridwar University, Roorkee, Haridwar
Internal Guide: Akanksha Shukla, Assistant-Professor-CA -akanksha.cse@huroorkee.ac.in

Abstract: Theory of Mind (ToM) — the cognitive capacity to attribute beliefs, intentions, desires, and emotions to oneself and others — is considered a cornerstone of human social intelligence. As Large Language Models (LLMs) such as GPT-4o, LLaMA-3.1-70B, and Qwen2.5-72B are increasingly deployed in social and interactive roles, the question of whether they genuinely possess ToM capabilities has become both scientifically significant and practically urgent. However, the existing landscape of ToM evaluation is fragmented, primarily relying on behavioral benchmarks that test only whether a model produces the correct output, without investigating the underlying computational mechanism or the stability of that reasoning. This paper proposes a Functional Theory of Mind Evaluation Framework that addresses thisgap through three integrated layers of analysis: (1) behavioral accuracy evaluation using structured benchmarks(BigToM and ToMValley), (2) causal internal representation analysis using perspective projection and counterfactualinterventions gr ounded in Simulation Theory, and (3) reasoning stability measurement using transformation-based divergence testing. Experimental analysis across five leading LLMs demonstrates significant variation in behavioral accuracy (35–67%), with transformation and belief- tracking questions proving hardest. Counterfactual intervention experiments reveal that later Transformer layers (65–80) encode perspective-taking representations with measurable causal effects on model outputs, providing partial support for Simulation Theory as an explanatory mechanism. Stabilitytesting reveals that all models exhibit significant brittleness under adversarial scenario modifications, with answer consistency dropping 18–34% under minimal transformations. We propose a unified Functional ToM Score that integrates these three dimensions into a single interpretable metric, and discuss implications for AI safety, evaluationmethodology, and future benchmark design. Keywords: Theory of Mind, Large Language Models, Simulation Theory, false-belief evaluation, causal representationanalysis, reasoning stability, Functional ToM Score, social reasoning, mechanistic interpretability, BigToM, ToMValley.