Accuracy & Disambiguation

In high-stakes environments, a wrong answer is worse than no answer. Squad is built around this principle: every response passes through multiple layers of classification, disambiguation, and confidence evaluation before it reaches the user. The system is designed to ask rather than guess, and to decline rather than hallucinate.

Intent Classification

When a query arrives, AIM classifies it across several dimensions before deciding how to respond. This classification determines the entire downstream route.

Classification	What It Detects	Route
Entity ambiguity	Multiple entities match the query (“Smith” could be several people)	Disambiguation
Relation ambiguity	An entity has multiple facets (“tell me about Valve X”: its spec, its history, its failures?)	Clarification
Quantity ambiguity	Vague quantifiers (“show me some results” vs “show me all results”)	Clarification
Underspecified	Query lacks sufficient constraints to produce a reliable answer	Clarification
Recommendation	Open-ended request that requires understanding preferences	Preference elicitation
Procedural	Task that maps to a stored workflow or requires multi-step execution	Planner
Clear	Unambiguous factual query with a confident match	Direct retrieval

This classification is not binary: the system assesses confidence across all categories simultaneously and routes to the highest-priority concern first.

Disambiguation via Expected Information Gain

When a query is ambiguous, Squad doesn’t pick the most likely interpretation and hope for the best. Instead, it asks the question that will maximally reduce uncertainty: a technique called Expected Information Gain (EIG).

The process works in three steps:

Candidate identification: Squad identifies all plausible interpretations of the query from the knowledge graph. If the user asks about “transformer failures”, the system finds all entities matching that description: specific components, failure categories, document references.
Question selection: The system evaluates candidate clarifying questions and selects the one whose answer would partition the candidate set most evenly. A question that splits 10 candidates into two groups of 5 is more informative than one that splits them 9-to-1.
Iterative narrowing: The user’s answer eliminates candidates, and the process repeats if necessary. In practice, most ambiguities resolve within one or two rounds.

Multi-Turn Clarification

Disambiguation can span multiple exchanges. At each turn, Squad:

Updates its candidate distribution based on the user’s response
Re-evaluates whether confidence is sufficient to answer
Asks another question only if the remaining ambiguity is above threshold
Proceeds with a best-effort response after a configurable maximum number of rounds

Convergence Detection

Squad tracks a confidence metric throughout the reasoning process. A response is only delivered when the system’s confidence that it has correctly understood and answered the query exceeds a threshold.

Confidence State	System Action
High confidence	Answer directly, no clarification needed
Medium confidence	Ask a clarifying question to increase certainty
Low confidence	Acknowledge uncertainty, provide what is known, suggest next steps

This means Squad never silently delivers a low-confidence answer. If it isn’t sure, the user knows.

Query Approval and Template Reuse

When AIM generates a novel query to answer a question, that query enters a pending state. It can work for the current session, but it should not be reused for future requests until a human approves it.

Once approved:

The query is embedded and indexed in the approved query store
Future similar questions are matched against it via vector similarity (HNSW)
A strong match means the system can reuse the proven approach directly: skipping planning entirely
Reuse statistics are tracked so administrators can see which templates are most valuable

This is how Squad progressively shifts from expensive, deliberate reasoning (System 2) to fast, automatic execution (System 1). But the transition only happens through human approval: the system never promotes its own outputs without oversight.

Human Control Over Learning

Squad’s learning is powerful precisely because it’s governed. At every level, humans control what the system learns and how it applies that knowledge:

What gets approved

Every query template, workflow, and tool definition requires explicit human approval before it enters the system’s repertoire. Nothing is auto-promoted.

Query Curation →

Who can approve

Role-based access control determines who can approve queries, manage workflows, and administer the system.

Administration & Access →

What can be revoked

Approved workflows and query templates can be revoked at any time by an administrator, immediately removing them from the system’s active repertoire.

Guardrails & Safety →

What’s visible

The full audit trail, approval history, and reuse statistics are available for inspection.

Transparency & Audit →

High-Risk Filtering

For queries flagged as high-risk, Squad applies stricter controls. A high-risk query without a strong match to a previously approved template is declined automatically: it never reaches the execution stage.

Risk Level	Similarity Match	Action
Low / Medium	Any	Proceed normally
High	Strong match to approved template	Execute using proven approach
High	No strong match	Decline with explanation

This ensures that in safety-critical contexts, the system only acts when it has a proven, human-approved approach. Novel high-risk queries surface as review items rather than being attempted speculatively.

Knowledge Gap Honesty

When Squad cannot find relevant information in the knowledge graph, it does not attempt to fill the gap with general knowledge or hallucinated content. Instead, it:

Acknowledges the gap: Clearly states what it could not find
Reports what it tried: Surfaces the retrieval strategies it attempted
Proposes next steps: Suggests actions the user can take (upload relevant documents, refine the query, check external sources)
Falls back to external search: If configured, performs a web search and clearly labels the results as externally sourced rather than from the organisation’s knowledge base

How These Capabilities Connect

Accuracy is not a single feature but a property that emerges from multiple systems working together:

Disambiguation reduces the chance of answering the wrong question
Convergence detection prevents low-confidence answers from being delivered
High-risk filtering ensures safety-critical queries only use proven approaches
Knowledge gap honesty eliminates hallucination by making uncertainty visible
Human-in-the-loop feeds corrections back into the system, improving accuracy over time

For related governance and review workflows, see Human-in-the-Loop and Query Curation.

AIM Overview: the full reasoning pipeline that these accuracy controls sit within
Guardrails & Safety: defence-in-depth controls including security review and RBAC
Human-in-the-Loop: manual oversight and correction workflows
Query Curation: approving queries as reusable templates