Good Explanation Definition: Why LLM Outputs Are Hard to Explain

Hannah Foster

June 17, 2026

213

original

A new paper redefines what makes a good explanation, focusing on counterfactuals and the listener's prior beliefs, revealing unique challenges for LLM interpretability. It argues that explanation quality depends on the user's knowledge, making static AI explanations insufficient. This framework pushes researchers and practitioners to rethink how we communicate model reasoning.

What makes an explanation 'good' when we ask an AI model to justify its output? It sounds like a simple question, but behind it lies decades of philosophical debate. A recent paper on arXiv attempts to pin down a precise definition, specifically tackling the interpretability nightmares of large language models.

Counterfactuals and Prior Beliefs

The paper's core idea is refreshingly straightforward: a good explanation should help the listener understand why the output was X instead of Y. This counterfactual approach isn't new in explainable AI, but the authors take it further. They argue that an explanation's effectiveness also depends on what the listener already knows. The same explanation works differently for a domain expert versus a newcomer. For instance, if an LLM answers 'Paris is the capital of France,' a geography buff needs no explanation, but someone unfamiliar with Europe might need to know what 'France' is and why Paris is the capital. The paper formalizes this dependence on prior beliefs, turning explanations from static outputs into dynamic acts of communication.

Why LLMs Are Unusually Hard to Explain

Under this new definition, LLMs become particularly troublesome. First, an LLM is essentially a giant probabilistic system that generates the next word based on trillions of parameters, not a clean logical chain. Extracting a clear counterfactual path—'if the input had been different, the output would have changed like this'—is nearly impossible because the model's internal representations are highly distributed. Second, users' prior beliefs vary wildly. A doctor and a middle school student asking the same question need very different explanation depths. Yet current tools like attention weights or gradient attribution only provide static, technical attributions that can't adapt to the user's background. The authors also point out that LLM generation includes stochastic elements (sampling temperature, top-k), which makes counterfactual reasoning even messier. The same question might yield two different answers, so the 'why A instead of B' question loses a stable foundation.

Practical Impact: A Shift in Interpretability Research

This paper isn't just philosophical navel-gazing. For AI development and deployment teams, it suggests that chasing a single 'perfect explanation' might be unrealistic. A better approach is to build interactive explanation systems that dynamically adjust content and detail based on user feedback. For example, when a user looks confused about a conclusion, the system automatically provides more background facts. This aligns with the paper's core message. On the regulatory side, if we can't even agree on what a good explanation is, requiring models to produce 'explainable' outputs remains a huge technical and legal hurdle. Of course, the definition itself is still contentious. How do we quantify a listener's prior beliefs? Whose beliefs take precedence when they conflict? The paper doesn't answer all these questions, but it forces the field to sit down and rethink the fundamentals. At the end of the day, a good explanation isn't about dumping more information—it's about helping someone see what would have happened if things were different. And for LLMs, finding that stable, trustworthy alternative path is proving far harder than we'd imagined.

LLM interpretabilitycounterfactual explanationsprior beliefsAI explainabilitygood explanation definitionLLM outputsexplainable AIarXiv paper

Comments

No comments yet

Be the first to comment

Explore More

Similar Tools

SharpLines

SharpLines is an AI-powered tool for real-time sports predictions across major leagues like NBA, NFL, and MLB. It leverages a 10-model ensemble system, integrating line movement and market sentiment analysis to provide detailed AI reasoning and win probability for each game. The platform also includes a DFS lineup optimizer and scorer. A free tier offers basic prediction features, making it suitable for sports bettors and daily fantasy sports players.

GeoInfer

GeoInfer is an AI-powered geolocation tool designed for investigators, journalists, law enforcement, and security experts. It rapidly infers photo locations by analyzing visual cues like architecture, terrain, and vegetation, eliminating the need for manual map comparison. Supporting batch processing, it's ideal for open-source intelligence (OSINT) investigations, disaster response, and news fact-checking.

Osmosis

Osmosis is a novel AI-native CRM that ditches traditional forms, letting teams manage deals and cases through natural conversations in shared channels. AI agents automatically update records, ensuring everyone hears every call, reads every objection, and absorbs sales wisdom from top performers. Knowledge spreads organically, like osmosis.

Weather Studio

Weather Studio is a specialized weather forecasting platform designed for cinematographers and producers. It integrates real-time meteorological data, sun position tracking, shadow analysis, and AI-generated production reports. This helps film crews efficiently plan outdoor shoots, avoiding wasted production days due to unpredictable weather and lighting conditions.

Riskified

Riskified is an AI-driven fraud prevention and risk intelligence platform tailored for e-commerce. It uses machine learning to automatically review transactions, reducing chargebacks and boosting revenue. The platform analyzes user behavior in real time, balancing security and conversion rates. Used by many large online retailers.

Ulcerative Colitis Insights

Ulcerative Colitis Insights is a free, AI-powered platform designed to help users navigate the complexities of Ulcerative Colitis (UC). It synthesizes over 15,600 patient experiences and 20,000+ PubMed articles, offering insights into symptom patterns, community medication trends, and the latest research. This tool provides valuable data-driven perspectives for both patients and healthcare professionals, all without a price tag.

Open-source Alternatives

Operit: The Ultimate Open-Source Android AI Agent

Operit is an open-source AI agent and chat application for Android, offering deep customization and support for various large language models. With over 5,600 stars on GitHub, it's lauded by developers as one of the most powerful AI assistants available on the platform, providing a highly flexible conversational experience.

Casdoor: Open-Source IAM for AI Agents

Casdoor is an open-source, Agent-first Identity and Access Management (IAM) platform. It's built with AI agents in mind, offering LLM MCP support alongside standard protocols like OAuth, OIDC, and SAML. Developed in Go, Casdoor provides a high-performance, self-hostable solution with a built-in web UI, making it ideal for modern applications and AI agent authentication and authorization needs.

OctoBot: Free AI Crypto Trading Bot for Everyone

OctoBot is an open-source, free cryptocurrency trading bot supporting over 15 exchanges like Binance and Hyperliquid. It automates diverse strategies including AI, grid trading, DCA, and TradingView signals. With an intuitive web interface, it's accessible for both beginners and advanced traders, requiring no coding for basic setup.

OpenAlice: Open-Source AI for All Asset Trading

OpenAlice is an open-source AI trading agent designed to automate the entire trading lifecycle across stocks, cryptocurrencies, commodities, and forex. Built with TypeScript, it boasts over 5,200 GitHub stars, offering a powerful, customizable framework for technically-inclined traders looking to bring institutional-grade automation to their personal portfolios. It handles everything from market research to position management.

Awesome-LLM4Cybersecurity: LLMs for Cybersecurity Resources

Awesome-LLM4Cybersecurity is a curated GitHub repository compiling the latest papers, tools, datasets, and frameworks at the intersection of large language models and cybersecurity. Maintained by a community of experts, it boasts over 1600 stars, making it an essential resource for security researchers and AI developers looking to quickly get up to speed or track cutting-edge advancements in the field.

comp: Open Source AI Compliance, Vanta & Drata Alternative

comp is an open-source, AI-native compliance platform that automates SOC 2, ISO 27001, and more. As a self-hosted alternative to Vanta and Drata, it reduces costs and keeps your data on your own infrastructure. Built with TypeScript, it offers automated evidence collection, smart policy checks, and risk analysis. Ideal for mid-size teams that value data sovereignty and customization.