Distributed AI Processing Platform
Route inference across a global worker network—or keep workloads on private infrastructure you control.
API
API Access On Every Paid Plan
API Access Included In Every Paid Plan. Integrate with standard REST endpoints for requests, conversations, workspaces, and worker management. No separate API product tier.
API ReferenceRequest Lifecycle
Transparent flow from submission to response.
Client submits prompt via REST API with workspace and model parameters.
Request is persisted with metadata and matched to eligible worker groups.
Workers poll for compatible tasks based on model, size class, and scope.
Selected worker runs inference and reports token usage.
Response is stored and returned; conversation turns are tracked.
Architecture Overview
Interactive view of how components connect.
Worker Network
Workers register in groups with supported models, compute type (CPU/GPU), size class, and scope (internal, external, or hybrid). The platform routes requests to workers that match requirements—public marketplace nodes or your private fleet.
Join As Compute ProviderWorker scopes
- Internal — only your organization's requests
- External — marketplace tasks from other users
- Hybrid — both internal and external workloads
Storage Architecture
Bring your own storage for prompts, metadata, conversations, and outputs. Enterprises maintain data residency and auditability while using HotlineLLM for orchestration and billing.
Enterprise Data ControlMetadata Generation
Each request generates structured metadata for analytics, debugging, and cost tracking. Conversation history links turns with request IDs, models, token counts, and worker attribution.