Skip to content

Accuracy & Disambiguation

In high-stakes environments, a wrong answer is worse than no answer. Squad is built around this principle: every response passes through multiple layers of classification, disambiguation, and confidence evaluation before it reaches the user. The system is designed to ask rather than guess, and to decline rather than hallucinate.

Intent Classification

When a query arrives, AIM classifies it across several dimensions before deciding how to respond. This classification determines the entire downstream route.

ClassificationWhat It DetectsRoute
Entity ambiguityMultiple entities match the query (“Smith” could be several people)Disambiguation
Relation ambiguityAn entity has multiple facets (“tell me about Valve X”: its spec, its history, its failures?)Clarification
Quantity ambiguityVague quantifiers (“show me some results” vs “show me all results”)Clarification
UnderspecifiedQuery lacks sufficient constraints to produce a reliable answerClarification
RecommendationOpen-ended request that requires understanding preferencesPreference elicitation
ProceduralTask that maps to a stored workflow or requires multi-step executionPlanner
ClearUnambiguous factual query with a confident matchDirect retrieval

This classification is not binary: the system assesses confidence across all categories simultaneously and routes to the highest-priority concern first.

Disambiguation via Expected Information Gain

When a query is ambiguous, Squad doesn’t pick the most likely interpretation and hope for the best. Instead, it asks the question that will maximally reduce uncertainty: a technique called Expected Information Gain (EIG).

The process works in three steps:

  1. Candidate identification: Squad identifies all plausible interpretations of the query from the knowledge graph. If the user asks about “transformer failures”, the system finds all entities matching that description: specific components, failure categories, document references.

  2. Question selection: The system evaluates candidate clarifying questions and selects the one whose answer would partition the candidate set most evenly. A question that splits 10 candidates into two groups of 5 is more informative than one that splits them 9-to-1.

  3. Iterative narrowing: The user’s answer eliminates candidates, and the process repeats if necessary. In practice, most ambiguities resolve within one or two rounds.

Multi-Turn Clarification

Disambiguation can span multiple exchanges. At each turn, Squad:

  • Updates its candidate distribution based on the user’s response
  • Re-evaluates whether confidence is sufficient to answer
  • Asks another question only if the remaining ambiguity is above threshold
  • Proceeds with a best-effort response after a configurable maximum number of rounds

Convergence Detection

Squad tracks a confidence metric throughout the reasoning process. A response is only delivered when the system’s confidence that it has correctly understood and answered the query exceeds a threshold.

Confidence StateSystem Action
High confidenceAnswer directly, no clarification needed
Medium confidenceAsk a clarifying question to increase certainty
Low confidenceAcknowledge uncertainty, provide what is known, suggest next steps

This means Squad never silently delivers a low-confidence answer. If it isn’t sure, the user knows.

Query Approval and Template Reuse

When AIM generates a novel query to answer a question, that query enters a pending state. It can work for the current session, but it should not be reused for future requests until a human approves it.

Once approved:

  1. The query is embedded and indexed in the approved query store
  2. Future similar questions are matched against it via vector similarity (HNSW)
  3. A strong match means the system can reuse the proven approach directly: skipping planning entirely
  4. Reuse statistics are tracked so administrators can see which templates are most valuable

This is how Squad progressively shifts from expensive, deliberate reasoning (System 2) to fast, automatic execution (System 1). But the transition only happens through human approval: the system never promotes its own outputs without oversight.

Human Control Over Learning

Squad’s learning is powerful precisely because it’s governed. At every level, humans control what the system learns and how it applies that knowledge:

What gets approved

Every query template, workflow, and tool definition requires explicit human approval before it enters the system’s repertoire. Nothing is auto-promoted.

Query Curation →

Who can approve

Role-based access control determines who can approve queries, manage workflows, and administer the system.

Administration & Access →

What can be revoked

Approved workflows and query templates can be revoked at any time by an administrator, immediately removing them from the system’s active repertoire.

Guardrails & Safety →

What’s visible

The full audit trail, approval history, and reuse statistics are available for inspection.

Transparency & Audit →

High-Risk Filtering

For queries flagged as high-risk, Squad applies stricter controls. A high-risk query without a strong match to a previously approved template is declined automatically: it never reaches the execution stage.

AIM Reasoning Pipeline
System 1 path (fast, proven) System 2 path (AIM + safety controls) Yes No No Yes No Yes Yes Feeds back as approved query template Retrieve linked few-shot prompt LLM Summary Output answer User prompt Embed user prompt Compare against approved queries Over 90% similar? High risk query? AIM Error? Decline response & store Generate cypher and test result Human approval? Human curation of few-shot prompt
Risk LevelSimilarity MatchAction
Low / MediumAnyProceed normally
HighStrong match to approved templateExecute using proven approach
HighNo strong matchDecline with explanation

This ensures that in safety-critical contexts, the system only acts when it has a proven, human-approved approach. Novel high-risk queries surface as review items rather than being attempted speculatively.

Knowledge Gap Honesty

When Squad cannot find relevant information in the knowledge graph, it does not attempt to fill the gap with general knowledge or hallucinated content. Instead, it:

  1. Acknowledges the gap: Clearly states what it could not find
  2. Reports what it tried: Surfaces the retrieval strategies it attempted
  3. Proposes next steps: Suggests actions the user can take (upload relevant documents, refine the query, check external sources)
  4. Falls back to external search: If configured, performs a web search and clearly labels the results as externally sourced rather than from the organisation’s knowledge base

How These Capabilities Connect

Accuracy is not a single feature but a property that emerges from multiple systems working together:

  • Disambiguation reduces the chance of answering the wrong question
  • Convergence detection prevents low-confidence answers from being delivered
  • High-risk filtering ensures safety-critical queries only use proven approaches
  • Knowledge gap honesty eliminates hallucination by making uncertainty visible
  • Human-in-the-loop feeds corrections back into the system, improving accuracy over time

For related governance and review workflows, see Human-in-the-Loop and Query Curation.