Semantic Layer

The Semantic Layer is Squad’s structured representation of everything it knows about your organisation: the entities, relationships, and domain concepts extracted from your data. Thanks to USEP’s dual ingestion approach — Discovery Mode for fast, schema-free indexing and Ontology-Fed Mode for precise, schema-grounded extraction — Squad can build this layer from any combination of structured and unstructured sources, with or without a predefined ontology.

The result is a unified semantic map across your organisation that provides the foundation for entity resolution, supply chain mapping, digital twins, data lineage, lexical extraction, and any use case that requires understanding the relationships between things in your domain.

Entity Extraction

Entities enter the Semantic Layer through USEP’s staged extraction cascade: a brain-inspired pipeline that starts with fast, cheap pattern recognition and only escalates to expensive AI reasoning when necessary. As the knowledge graph grows, more entities are resolved by fast lookup rather than costly extraction. In production, the fast recognition stages typically resolve 70-80% of entities without any LLM involvement.

See Data Ingestion for full details on the extraction pipeline, including the tiered cascade stages, content policy checks, and how the system gets faster over time.

Ontology & Entity Classification

Squad classifies extracted entities using the POLE+O framework: a well-established model for structuring knowledge across diverse domains:

Type	Description	Examples
Person	Individual people	Employees, contacts, public figures
Organisation	Companies, institutions, groups	Suppliers, partners, regulatory bodies
Location	Places and geographic features	Facilities, regions, addresses
Event	Occurrences and incidents	Meetings, incidents, deliveries
Object	Products, artifacts, works	Equipment, documents, assets

Beyond these foundational types, Squad extends with cognitive primitives: Concept, Tool, Procedure, and Fact: that capture abstract knowledge and operational patterns.

Attractor-Based Schema

Squad doesn’t use a rigid, predefined schema. Instead, domain-specific entity types crystallise organically as data is ingested. Entities accumulate with their extracted types, clusters form in the embedding space as similar entities group, and the system detects stable clusters and creates new type classifications. Each type develops recognition patterns and confidence calibration derived from its instances, and future extractions are automatically classified against crystallised types.

This means Squad adapts to your domain without manual schema configuration. A defence logistics deployment develops different entity types than a financial services one: automatically. See Data Ingestion: Three-Layer Storage for how POLE+O classification integrates with the storage model.

Entity Resolution

When entities are mentioned across multiple documents and data sources, Squad resolves them to a single canonical representation through progressive consolidation: deterministic matching first (handling ~75% of cases), followed by semantic resolution for ambiguous cases. Multiple passes with progressively relaxed thresholds ensure high-confidence merges happen first, building context that improves accuracy in later rounds.

All merges are soft: recorded as reversible relationships that can be undone if a merge turns out to be incorrect. Medium-confidence cases are flagged for human review rather than auto-merged.

See Data Ingestion: Entity Resolution for the full resolution pipeline, including candidate blocking strategies, confidence bands, and enrichment.

Building the Semantic Layer

The Semantic Layer is built progressively through USEP’s dual ingestion modes, and the two approaches complement each other:

Discovery Mode produces a lightweight concept network from pure linguistic analysis: noun phrases, co-occurrence relationships, and hierarchical topic communities. This is fast, schema-free, and works out of the box with zero configuration.
Ontology-Fed Mode runs the full extraction cascade grounded by a domain schema, producing richly typed entities and relationships that align with your domain vocabulary.

Both modes feed into the same three-layer storage model (Episodes → Mentions → Entities), and their outputs are unified through entity resolution. You can start with Discovery Mode to explore your data, then add an ontology later to deepen the structure — or run both simultaneously on different data sources.

Use Cases

The Semantic Layer enables a range of analytical capabilities:

Lexical extraction and document analysis: extract structured entities and relationships from unstructured documents. Identify people, organisations, locations, and events mentioned across your document corpus, with full provenance back to source material.

Supply chain and network mapping: map relationships between organisations, locations, and assets. Trace dependencies, identify single points of failure, and understand the structure of complex operational networks.

Digital twin construction: model real-world assets, systems, and processes as a connected graph of entities with typed relationships. The Semantic Layer provides a queryable digital representation of physical infrastructure — equipment hierarchies, site layouts, operational dependencies — that stays current as new data is ingested.

Data lineage and provenance tracking: trace how information flows across your organisation. Every entity in the Semantic Layer carries full source provenance back to its originating documents and data sources, enabling you to answer “where did this fact come from?” and “which sources contributed to this entity?” across the entire knowledge base.

Entity-centric investigation: start from any entity and explore its connections. See which documents mention it, what other entities it co-occurs with, and how it fits into broader community structures.

Cross-source intelligence: resolve the same real-world entity across different data sources and document types. Build a unified view even when different sources use different names, abbreviations, or identifiers.

Next Steps

Memory Architecture: the broader memory system the Semantic Layer sits within
Retrieval Graph: how Squad retrieves relevant context at query time
Data Ingestion: how data enters the Semantic Layer through USEP’s dual ingestion modes