Model Providers¶
A Provider is the seam between OpenArmature's graph engine and any LLM backend — OpenAI's hosted API, an Anthropic Messages endpoint, a local vLLM / LM Studio / llama.cpp server, or an internal gateway. The engine doesn't know about LLMs; nodes call providers, providers do the wire work.
What ships¶
OpenAIProvider— implements the OpenAI Chat Completions wire format (POST /v1/chat/completions). Talks to OpenAI itself plus the local servers that adopt the same format (vLLM, LM Studio, llama.cpp). One Provider class covers most real-world deployments.
For wire formats that aren't OpenAI Chat Completions (Anthropic Messages, Bedrock, gateways with custom shapes), write your own. See Authoring a Provider.
The contract¶
A Provider implements two async methods:
from collections.abc import Sequence
from typing import Protocol
from openarmature.llm import Message, Response, RuntimeConfig, Tool
class Provider(Protocol):
async def ready(self) -> None: ...
async def complete(
self,
messages: Sequence[Message],
tools: Sequence[Tool] | None = None,
config: RuntimeConfig | None = None,
) -> Response: ...
ready()verifies the bound model is reachable. Pre-flight check, typically called once before invoking the graph.complete()performs a single completion call and returns the fullResponse— message, finish reason, token usage, raw wire payload.
Behaviour guarantees¶
- Stateless. Every
complete()call carries the full message list. There is no implicit conversation memory. - Reentrant. Safe to call concurrently from many nodes. The underlying HTTP client is shared but task-safe.
- Non-mutating. Inputs (
messages,tools) are never modified. - No tool-call loops. When the model wants to call a tool, the
Provider returns with
finish_reason="tool_calls". The caller executes the tool and makes a follow-oncomplete()with the result. The Provider does not re-enter itself. - No retry on transient errors. That's middleware's job — wrap a
node in
RetryMiddlewareor similar.
Errors¶
Seven canonical error categories cover every failure mode:
| Error | Trigger |
|---|---|
ProviderAuthentication |
401 / 403 — bad key, expired token |
ProviderUnavailable |
5xx, network failure, timeout |
ProviderInvalidModel |
Bound model doesn't exist on the provider |
ProviderModelNotLoaded |
Model known but not currently serving |
ProviderRateLimit |
429 (with Retry-After exposed) |
ProviderInvalidResponse |
200 OK that fails to parse |
ProviderInvalidRequest |
Malformed request (per-message or list-level) |
Three of these (Unavailable, RateLimit, ModelNotLoaded) are
exported in TRANSIENT_CATEGORIES — the canonical "safe to retry"
set used by the default retry-middleware classifier.
A minimal example¶
Direct usage of an OpenAIProvider against a local server, without
the engine in the picture:
import asyncio
from openarmature.llm import OpenAIProvider, UserMessage
async def main() -> None:
provider = OpenAIProvider(
base_url="http://localhost:8000/v1", # any OpenAI-compatible endpoint
model="some-model",
api_key="optional-for-local-servers",
)
# await provider.ready() # pre-flight; needs a live endpoint
# response = await provider.complete(
# messages=[UserMessage(content="Hello, world!")],
# )
# print(response.message.content)
asyncio.run(main())
In a real graph you'd construct one Provider at startup and let nodes call it inside their bodies.
Where to next¶
- Authoring a Provider — how to implement the Protocol for a non-default wire format. Includes a ~60-line skeleton + contract checklist.
- API reference:
openarmature.llm— the full public surface (Message types, Response, Usage, RuntimeConfig, error classes).