Sources & Connectors

Squad accepts data from a range of sources and formats, normalising everything into a unified knowledge graph through the Ingestion Pipeline. This page covers the ways data can enter the platform and the integration patterns available.

Supported Input Formats

Squad’s ingestion pipeline accepts content in the following formats, with format-agnostic normalisation into structured text:

Format	Notes
PDF	Parsed via Docling with layout-aware extraction
DOCX / PPTX	Microsoft Office documents and presentations
Markdown	Direct text with structural markers preserved
CSV / Excel	Tabular data with column headers mapped to entity properties
HTML	Web content with structural extraction
Plain text	Raw text content
Images	OCR-based extraction for scanned documents

All formats pass through the same normalisation step before entering the extraction pipeline, ensuring consistent Episode structure regardless of source.

Upload Mechanisms

Platform UI

The primary interface for ad-hoc uploads. Users can drag and drop files directly into the Squad interface for ingestion.

REST API

Programmatic upload via the platform API:

POST /api/v1/documents/upload     # Upload documents for ingestion
POST /api/upload-files/{run_id}   # Attach files to a specific run

CLI

For bulk ingestion and scripted workflows:

usep ingest document.pdf
usep ingest ./documents/ --ontology schema.json
usep lazygraph ./corpus/

Storage Backend

Uploaded files and workflow artifacts are stored in an object store (MinIO) with per-run isolation. Each workflow execution maintains its own inputs/, working/, and outputs/ directories.

Run artifacts can be previewed or downloaded via the API:

GET /api/v1/runs/{run_id}/preview/{path}
GET /api/v1/runs/{run_id}/download/{path}

Ingestion Pipeline: the full USEP extraction pipeline
Memory Architecture: how ingested data is stored across memory systems
Semantic Layer: the graph structure built from ingested data