Skip to content

Sources & Connectors

Squad accepts data from a range of sources and formats, normalising everything into a unified knowledge graph through the Ingestion Pipeline. This page covers the ways data can enter the platform and the integration patterns available.

Supported Input Formats

Squad’s ingestion pipeline accepts content in the following formats, with format-agnostic normalisation into structured text:

FormatNotes
PDFParsed via Docling with layout-aware extraction
DOCX / PPTXMicrosoft Office documents and presentations
MarkdownDirect text with structural markers preserved
CSV / ExcelTabular data with column headers mapped to entity properties
HTMLWeb content with structural extraction
Plain textRaw text content
ImagesOCR-based extraction for scanned documents

All formats pass through the same normalisation step before entering the extraction pipeline, ensuring consistent Episode structure regardless of source.

Upload Mechanisms

Platform UI

The primary interface for ad-hoc uploads. Users can drag and drop files directly into the Squad interface for ingestion.

REST API

Programmatic upload via the platform API:

POST /api/v1/documents/upload # Upload documents for ingestion
POST /api/upload-files/{run_id} # Attach files to a specific run

CLI

For bulk ingestion and scripted workflows:

Terminal window
usep ingest document.pdf
usep ingest ./documents/ --ontology schema.json
usep lazygraph ./corpus/

Storage Backend

Uploaded files and workflow artifacts are stored in an object store (MinIO) with per-run isolation. Each workflow execution maintains its own inputs/, working/, and outputs/ directories.

Run artifacts can be previewed or downloaded via the API:

GET /api/v1/runs/{run_id}/preview/{path}
GET /api/v1/runs/{run_id}/download/{path}