openarmature.llm¶
openarmature.llm — LLM provider abstraction.
Public surface: typed Message / Tool / Response, the
Provider Protocol, the canonical error categories, and an
OpenAI-compatible provider. Users write::
from openarmature.llm import (
AssistantMessage,
OpenAIProvider,
Provider,
SystemMessage,
Tool,
ToolCall,
UserMessage,
)
All seven error categories and the canonical TRANSIENT_CATEGORIES
frozenset are also re-exported here so callers writing custom retry
classifiers don't have to reach into openarmature.llm.errors.
LlmProviderError ¶
Bases: Exception
Base for all llm-provider errors. Each subclass carries a
category class attribute matching one of the canonical
category strings above.
Provider-originated errors SHOULD preserve the underlying provider
exception as __cause__ so callers can reach the wire-level
detail when needed.
ProviderAuthentication ¶
ProviderInvalidModel ¶
Bases: LlmProviderError
The bound model does not exist on this provider. Terminal — retry will not succeed without changing the bound model.
ProviderInvalidRequest ¶
Bases: LlmProviderError
The request was malformed before sending (per-role message
constraints violated, tool_call_id does not match an earlier
assistant tool call, duplicate tool names, etc.). Raised by the
implementation's pre-send validation, not by the provider.
ProviderInvalidResponse ¶
Bases: LlmProviderError
Provider returned a malformed response that cannot be parsed
into the expected :class:Response shape (missing required
fields, invalid tool_calls structure, invalid JSON).
ProviderModelNotLoaded ¶
Bases: LlmProviderError
The bound model is known to the provider but is not currently
serving (e.g., a local vLLM/LM Studio/llama.cpp server has the
model configured but not loaded). Distinct from
provider_invalid_model because retry MAY succeed once loading
completes.
ProviderRateLimit ¶
Bases: LlmProviderError
Provider returned a rate-limit response (HTTP 429 or equivalent).
When the provider supplies a Retry-After header (or its
equivalent), the parsed seconds-to-wait surfaces on
:attr:retry_after. None if the provider didn't include one.
ProviderUnavailable ¶
AssistantMessage ¶
Bases: _MessageBase
Assistant messages MAY carry tool_calls. If tool_calls
is present and non-empty, content MAY be empty (the assistant
is purely calling tools); otherwise content MUST be a
non-empty string. tool_call_id MUST be absent.
SystemMessage ¶
Bases: _MessageBase
System messages have non-empty content; no tool_calls; no
tool_call_id.
Tool ¶
Bases: BaseModel
A function the model may request the user execute.
parameters is a JSON Schema (object schema) describing the
argument record. Kept as a plain dict[str, Any] rather than a
typed schema class so the "JSON Schema, not language-native
types" intent surfaces directly — implementations may offer
ergonomic constructors that compile from native types (Pydantic
model_json_schema()) but the surface is JSON Schema.
ToolCall ¶
Bases: BaseModel
An assistant's request to invoke a named tool.
id is an opaque correlator within a single message list.
Implementations MUST preserve provider-supplied ids verbatim —
neither rewriting nor normalizing.
ToolMessage ¶
Bases: _MessageBase
Tool messages carry the textual result of a tool call.
tool_call_id MUST be present and match the id of an
earlier assistant ToolCall in the same message list. The
list-level matching is checked at the complete() boundary by
:func:provider.validate_message_list, not at construction.
UserMessage ¶
Bases: _MessageBase
User messages have non-empty content; no tool_calls; no
tool_call_id.
Provider ¶
Bases: Protocol
The shape of any llm-provider implementation.
Implementations are bound to a single model identifier; switching models means constructing a new provider, not passing a different argument per call.
complete
async
¶
complete(
messages: Sequence[Message],
tools: Sequence[Tool] | None = None,
config: RuntimeConfig | None = None,
) -> Response
Perform a single completion call.
messages MUST NOT be mutated. complete() does NOT loop
on tool calls — if the response's finish_reason is
"tool_calls", the caller is responsible for executing the
tools and making a follow-on call with tool messages
appended. complete() does NOT retry; transient errors
propagate.
OpenAIProvider ¶
OpenAIProvider(
*,
base_url: str,
model: str,
api_key: str | None = None,
transport: AsyncBaseTransport | None = None,
timeout: float = 60.0
)
OpenAI Chat Completions wire-compatible provider.
Construct with a base URL, model identifier, and optional API key
+ transport (an :class:httpx.AsyncBaseTransport). The
transport parameter is the test seam — httpx.MockTransport
drives the conformance fixtures by intercepting HTTP calls and
returning canned responses, exercising the same wire-mapping
code production traffic would.
aclose
async
¶
Close the underlying HTTP client. Optional — async clients garbage-collect cleanly, but explicit close is RECOMMENDED in long-lived services to release the connection pool promptly.
ready
async
¶
Verify the bound model is reachable and listed by the
provider. Hits GET /v1/models and matches self.model
against the returned data[].id entries.
complete
async
¶
complete(
messages: Sequence[Message],
tools: Sequence[Tool] | None = None,
config: RuntimeConfig | None = None,
) -> Response
Single completion call.
Pre-send validation runs first (per-message Pydantic +
list-level invariants). HTTP errors map to canonical
provider-error categories. The successful 200 body is parsed
into a :class:Response — failure to parse raises
provider_invalid_response.
Response ¶
Bases: BaseModel
The result of a Provider.complete() call.
messageis the assistant message returned by the model. Alwaysrole: "assistant". May carrytool_calls.finish_reasonis one of the five canonical values ("stop"/"length"/"tool_calls"/"content_filter"/"error").usageis the token record (allNoneif the provider didn't report usage).rawis the parsed provider response, populated on every successful return. Carries everything the provider returned — the normalized fields above are derived from it.
RuntimeConfig ¶
Bases: BaseModel
Per-call sampling parameters and budget hints.
All four fields are optional. Implementations MAY accept additional provider-specific fields; this is the minimum.
Usage ¶
Bases: BaseModel
Token-accounting record.
Each field is a non-negative integer or None. If the provider
does not report usage, all three MUST be None.
validate_message_list ¶
Validate list-level invariants.
Per-message constraints (system/user need non-empty content, assistant content-or-tool_calls, etc.) are enforced by Pydantic on the per-role Message classes at construction time. This function adds the list-level invariants Pydantic-on-Message can't see.
Raises :class:ProviderInvalidRequest on the first violation.
validate_tools ¶
validate_tools(tools: Sequence[Tool] | None) -> None
Validate tool-list invariants. Tool names MUST be unique
within a single complete() call.
classify_http_error ¶
classify_http_error(resp: Response) -> LlmProviderError
Map a non-200 httpx.Response from an OpenAI-shape API to
the right canonical error category.
Returns the exception (does not raise) so the caller can
raise with consistent traceback context.
Reusable by third-party Provider implementations targeting any OpenAI-compatible endpoint (vLLM, LM Studio, llama.cpp server, etc.) — the wire shape is stable across these and the helper saves implementers from reimplementing the mapping table.
parse_retry_after ¶
Parse a Retry-After header value to a float seconds count.
HTTP allows seconds-int OR HTTP-date; this implementation handles the seconds-int form (the OpenAI/vendor norm) and ignores HTTP-date.
Reusable by third-party Provider implementations that need to
surface Retry-After to ProviderRateLimit.retry_after.