HotlineLLM
API Access Included In Every Paid Plan

Distributed AI Processing Platform

Route inference across a global worker network—or keep workloads on private infrastructure you control.

API

API Access On Every Paid Plan

API Access Included In Every Paid Plan. Integrate with standard REST endpoints for requests, conversations, workspaces, and worker management. No separate API product tier.

API Reference

Request Lifecycle

Transparent flow from submission to response.

Request

Client submits prompt via REST API with workspace and model parameters.

Queue

Request is persisted with metadata and matched to eligible worker groups.

Poll

Workers poll for compatible tasks based on model, size class, and scope.

Process

Selected worker runs inference and reports token usage.

Deliver

Response is stored and returned; conversation turns are tracked.

Architecture Overview

Interactive view of how components connect.

Your Application
HotlineLLM API
Request Queue
Worker A
Worker B
Private Worker
Your Storage

Worker Network

Workers register in groups with supported models, compute type (CPU/GPU), size class, and scope (internal, external, or hybrid). The platform routes requests to workers that match requirements—public marketplace nodes or your private fleet.

Join As Compute Provider

Worker scopes

  • Internal — only your organization's requests
  • External — marketplace tasks from other users
  • Hybrid — both internal and external workloads

Storage Architecture

Bring your own storage for prompts, metadata, conversations, and outputs. Enterprises maintain data residency and auditability while using HotlineLLM for orchestration and billing.

Enterprise Data Control

Metadata Generation

Each request generates structured metadata for analytics, debugging, and cost tracking. Conversation history links turns with request IDs, models, token counts, and worker attribution.

API Access Included In Every Paid Plan
Start Building