PathoSage: AI Agent for Pathology Evidence Arbitration

Marcus Chen

June 10, 2026

212

original

PathoSage is a novel three-stage AI agent framework designed for slide-level reasoning in pathology. It tackles the challenge of conflicting evidence from various tools and knowledge sources by employing structured evidence deliberation and a training-free Beta-Bernoulli empirical system. This approach independently evaluates contradictory evidence, significantly reducing anchoring bias and enhancing the reliability of multimodal large language models (MLLMs) in critical pathology diagnostics.

Pathology diagnosis is inherently a high-stakes game of inference. A single tissue slide can contain hundreds of fields of view, with critical lesions often hidden in inconspicuous corners. While multimodal large language models (MLLMs) have recently entered this domain, they often introduce new problems: models might hallucinate morphological details when specific features aren't present, or get sidetracked by conflicting retrieved information. This is a far cry from the 'closed chain of evidence' that human pathologists meticulously build.

A recent preprint paper introduces PathoSage, a new framework aiming to resolve these contradictions. Its core idea is refreshingly direct: instead of throwing all evidence into one pot, process each clue independently and sequentially, then make a final, informed judgment.

A Three-Stage Design: Isolation and Arbitration

PathoSage breaks down the complex reasoning process into three distinct phases: knowledge retrieval, evidence collection, and evidence arbitration. In the first stage, the system pulls relevant background information from external knowledge bases, like pathology textbooks or past case studies. The second stage involves calling upon multiple specialized tools—think cell counters or tissue segmentation models—to perform quantitative analysis on the slide images. The real heavy lifting, however, happens in the third stage.

The lynchpin here is Structured Evidence Deliberation. This module takes all the evidence generated in the first two steps—which can be contradictory and from different modalities—and scrutinizes each piece individually, assessing its credibility. If Tool A suggests high nuclear density, but a retrieved paper indicates low density for that region, the system doesn't just average them out. Instead, it performs a conflict analysis and then generates a final judgment within a fresh, unbiased context. This design deliberately sidesteps common 'anchoring biases,' preventing the model from overemphasizing an initial strong piece of evidence and overlooking other crucial information.

Empirical System: Training-Free Credibility Scoring

Another noteworthy technical aspect is the Beta-Bernoulli empirical system. This is akin to assigning a 'historical credit score' to each evidence source: how reliable has a particular tool been in the past? Have similar retrieval results been validated in comparable cases? What's particularly clever is that this credibility model requires no additional training. It adapts online through Bayesian updates. This means a new pathology workstation deployed with PathoSage can quickly optimize evidence weights based on local usage records, without waiting for a cloud-based model update.

From a practical deployment perspective, this design is quite friendly to hospital IT environments. Training a domain-specific small model often demands extensive annotated data, and in pathology, annotations themselves are a scarce resource.

Typical Use Cases and Impact

Imagine this workflow: a pathologist uploads a gastric cancer slide to the PathoSage system. The system automatically retrieves relevant literature while simultaneously invoking tools for cellular atypia detection and glandular structure analysis. The results from these two tools appear to conflict—one suggests high risk, the other leans towards benign. At this point, the structured deliberation module presents both sets of evidence side-by-side, highlighting their respective confidence levels and points of contention. Finally, it outputs a comprehensive judgment, complete with its underlying reasoning. The physician can quickly pinpoint the discrepancies and decide whether further slide review or additional immunohistochemistry is needed.

The real impact of this work lies in its ability to shift AI from 'black-box prediction' to 'auditable reasoning.' For regulatory bodies and ethical review boards, being able to clearly trace the source of each step in a model's judgment is far more valuable than a mere accuracy percentage.

Limitations and Future Directions

Of course, PathoSage is currently a preprint paper, validated on gastrointestinal pathology datasets. The noise and rare pathologies encountered in real clinical environments haven't been fully tested. Furthermore, the selection of tools is highly dependent on the modules pre-configured by the designer; if a critical tool is missing, overall performance could suffer.

Nevertheless, the direction and underlying philosophy are clear: future pathology AI assistants should function like meticulous colleagues, organizing information from diverse sources into a logical memo, rather than simply presenting a conclusion.

Pathology AIMultimodal LLMAgent WorkflowEvidence ArbitrationStructured Evidence DeliberationBeta-BernoulliPathology ReasoningMedical AISlide AnalysisDiagnostic AI