Sources & Connectors
Squad accepts data from a range of sources and formats, normalising everything into a unified knowledge graph through the Ingestion Pipeline. This page covers the ways data can enter the platform and the integration patterns available.
Supported Input Formats
Squad’s ingestion pipeline accepts content in the following formats, with format-agnostic normalisation into structured text:
| Format | Notes |
|---|---|
| Parsed via Docling with layout-aware extraction | |
| DOCX / PPTX | Microsoft Office documents and presentations |
| Markdown | Direct text with structural markers preserved |
| CSV / Excel | Tabular data with column headers mapped to entity properties |
| HTML | Web content with structural extraction |
| Plain text | Raw text content |
| Images | OCR-based extraction for scanned documents |
All formats pass through the same normalisation step before entering the extraction pipeline, ensuring consistent Episode structure regardless of source.
Upload Mechanisms
Platform UI
The primary interface for ad-hoc uploads. Users can drag and drop files directly into the Squad interface for ingestion.
REST API
Programmatic upload via the platform API:
POST /api/v1/documents/upload # Upload documents for ingestionPOST /api/upload-files/{run_id} # Attach files to a specific runCLI
For bulk ingestion and scripted workflows:
usep ingest document.pdfusep ingest ./documents/ --ontology schema.jsonusep lazygraph ./corpus/Storage Backend
Uploaded files and workflow artifacts are stored in an object store (MinIO) with per-run isolation. Each workflow execution maintains its own inputs/, working/, and outputs/ directories.
Run artifacts can be previewed or downloaded via the API:
GET /api/v1/runs/{run_id}/preview/{path}GET /api/v1/runs/{run_id}/download/{path}Related
- Ingestion Pipeline: the full USEP extraction pipeline
- Memory Architecture: how ingested data is stored across memory systems
- Semantic Layer: the graph structure built from ingested data