Skip to content

openarmature.llm

openarmature.llm — LLM provider abstraction.

Public surface: typed Message / Tool / Response, the Provider Protocol, the canonical error categories, and an OpenAI-compatible provider. Users write::

from openarmature.llm import (
    AssistantMessage,
    OpenAIProvider,
    Provider,
    SystemMessage,
    Tool,
    ToolCall,
    UserMessage,
)

All seven error categories and the canonical TRANSIENT_CATEGORIES frozenset are also re-exported here so callers writing custom retry classifiers don't have to reach into openarmature.llm.errors.

LlmProviderError

Bases: Exception

Base for all llm-provider errors. Each subclass carries a category class attribute matching one of the canonical category strings above.

Provider-originated errors SHOULD preserve the underlying provider exception as __cause__ so callers can reach the wire-level detail when needed.

ProviderAuthentication

Bases: LlmProviderError

Auth failed — invalid key, expired token, missing credentials.

ProviderInvalidModel

Bases: LlmProviderError

The bound model does not exist on this provider. Terminal — retry will not succeed without changing the bound model.

ProviderInvalidRequest

Bases: LlmProviderError

The request was malformed before sending (per-role message constraints violated, tool_call_id does not match an earlier assistant tool call, duplicate tool names, etc.). Raised by the implementation's pre-send validation, not by the provider.

ProviderInvalidResponse

Bases: LlmProviderError

Provider returned a malformed response that cannot be parsed into the expected :class:Response shape (missing required fields, invalid tool_calls structure, invalid JSON).

ProviderModelNotLoaded

Bases: LlmProviderError

The bound model is known to the provider but is not currently serving (e.g., a local vLLM/LM Studio/llama.cpp server has the model configured but not loaded). Distinct from provider_invalid_model because retry MAY succeed once loading completes.

ProviderRateLimit

ProviderRateLimit(
    *args: Any, retry_after: float | None = None
)

Bases: LlmProviderError

Provider returned a rate-limit response (HTTP 429 or equivalent).

When the provider supplies a Retry-After header (or its equivalent), the parsed seconds-to-wait surfaces on :attr:retry_after. None if the provider didn't include one.

ProviderUnavailable

Bases: LlmProviderError

Provider is unreachable — network failure, 5xx error, DNS, timeout.

AssistantMessage

Bases: _MessageBase

Assistant messages MAY carry tool_calls. If tool_calls is present and non-empty, content MAY be empty (the assistant is purely calling tools); otherwise content MUST be a non-empty string. tool_call_id MUST be absent.

SystemMessage

Bases: _MessageBase

System messages have non-empty content; no tool_calls; no tool_call_id.

Tool

Bases: BaseModel

A function the model may request the user execute.

parameters is a JSON Schema (object schema) describing the argument record. Kept as a plain dict[str, Any] rather than a typed schema class so the "JSON Schema, not language-native types" intent surfaces directly — implementations may offer ergonomic constructors that compile from native types (Pydantic model_json_schema()) but the surface is JSON Schema.

ToolCall

Bases: BaseModel

An assistant's request to invoke a named tool.

id is an opaque correlator within a single message list. Implementations MUST preserve provider-supplied ids verbatim — neither rewriting nor normalizing.

ToolMessage

Bases: _MessageBase

Tool messages carry the textual result of a tool call. tool_call_id MUST be present and match the id of an earlier assistant ToolCall in the same message list. The list-level matching is checked at the complete() boundary by :func:provider.validate_message_list, not at construction.

UserMessage

Bases: _MessageBase

User messages have non-empty content; no tool_calls; no tool_call_id.

Provider

Bases: Protocol

The shape of any llm-provider implementation.

Implementations are bound to a single model identifier; switching models means constructing a new provider, not passing a different argument per call.

ready async

ready() -> None

Verify the bound model is reachable and serving.

complete async

complete(
    messages: Sequence[Message],
    tools: Sequence[Tool] | None = None,
    config: RuntimeConfig | None = None,
) -> Response

Perform a single completion call.

messages MUST NOT be mutated. complete() does NOT loop on tool calls — if the response's finish_reason is "tool_calls", the caller is responsible for executing the tools and making a follow-on call with tool messages appended. complete() does NOT retry; transient errors propagate.

OpenAIProvider

OpenAIProvider(
    *,
    base_url: str,
    model: str,
    api_key: str | None = None,
    transport: AsyncBaseTransport | None = None,
    timeout: float = 60.0
)

OpenAI Chat Completions wire-compatible provider.

Construct with a base URL, model identifier, and optional API key + transport (an :class:httpx.AsyncBaseTransport). The transport parameter is the test seam — httpx.MockTransport drives the conformance fixtures by intercepting HTTP calls and returning canned responses, exercising the same wire-mapping code production traffic would.

aclose async

aclose() -> None

Close the underlying HTTP client. Optional — async clients garbage-collect cleanly, but explicit close is RECOMMENDED in long-lived services to release the connection pool promptly.

ready async

ready() -> None

Verify the bound model is reachable and listed by the provider. Hits GET /v1/models and matches self.model against the returned data[].id entries.

complete async

complete(
    messages: Sequence[Message],
    tools: Sequence[Tool] | None = None,
    config: RuntimeConfig | None = None,
) -> Response

Single completion call.

Pre-send validation runs first (per-message Pydantic + list-level invariants). HTTP errors map to canonical provider-error categories. The successful 200 body is parsed into a :class:Response — failure to parse raises provider_invalid_response.

Response

Bases: BaseModel

The result of a Provider.complete() call.

  • message is the assistant message returned by the model. Always role: "assistant". May carry tool_calls.
  • finish_reason is one of the five canonical values ("stop" / "length" / "tool_calls" / "content_filter" / "error").
  • usage is the token record (all None if the provider didn't report usage).
  • raw is the parsed provider response, populated on every successful return. Carries everything the provider returned — the normalized fields above are derived from it.

RuntimeConfig

Bases: BaseModel

Per-call sampling parameters and budget hints.

All four fields are optional. Implementations MAY accept additional provider-specific fields; this is the minimum.

Usage

Bases: BaseModel

Token-accounting record.

Each field is a non-negative integer or None. If the provider does not report usage, all three MUST be None.

validate_message_list

validate_message_list(messages: Sequence[Message]) -> None

Validate list-level invariants.

Per-message constraints (system/user need non-empty content, assistant content-or-tool_calls, etc.) are enforced by Pydantic on the per-role Message classes at construction time. This function adds the list-level invariants Pydantic-on-Message can't see.

Raises :class:ProviderInvalidRequest on the first violation.

validate_tools

validate_tools(tools: Sequence[Tool] | None) -> None

Validate tool-list invariants. Tool names MUST be unique within a single complete() call.

classify_http_error

classify_http_error(resp: Response) -> LlmProviderError

Map a non-200 httpx.Response from an OpenAI-shape API to the right canonical error category.

Returns the exception (does not raise) so the caller can raise with consistent traceback context.

Reusable by third-party Provider implementations targeting any OpenAI-compatible endpoint (vLLM, LM Studio, llama.cpp server, etc.) — the wire shape is stable across these and the helper saves implementers from reimplementing the mapping table.

parse_retry_after

parse_retry_after(value: str | None) -> float | None

Parse a Retry-After header value to a float seconds count.

HTTP allows seconds-int OR HTTP-date; this implementation handles the seconds-int form (the OpenAI/vendor norm) and ignores HTTP-date.

Reusable by third-party Provider implementations that need to surface Retry-After to ProviderRateLimit.retry_after.