- Schema normalization (Anthropic message blocks → unified content shape, OpenAI tool-call ids → canonical record).
- Reasoning-detail replay (OpenAI Responses reasoning items, Anthropic thinking blocks).
- Cache-marker placement (Anthropic
cache_control, OpenAI prompt_cache_key).
- Token-field interpretation (cached-read vs cache-write deltas, reasoning tokens, audio tokens).
- Auth flow (API key, OAuth PKCE, device code).
- Model policy (variant aliases, structured-output capability detection, thinking exposure).
Modes (standard, rlm) and canonical tool definitions never see provider-specific JSON. They operate on LlmRequest, LlmResponse, and ToolDefinition only.
| Provider | Kind | Transport / Auth |
| OpenAI API (direct) | openai | Bearer API-key auth against https://api.openai.com/v1/responses. Responses-only path; owns Responses reasoning replay and prompt-cache fields. |
| OpenAI-compatible | openai-compatible | Bearer API-key auth against a caller-supplied base_url; posts Chat Completions to {base_url}/chat/completions. Used for OpenRouter, Together, vLLM, etc. |
| Codex subscription | codex | OAuth device-code flow against ChatGPT Codex Responses backend with Codex-specific headers. |
| Anthropic | anthropic | API-key auth against /v1/messages with Anthropic version header and beta flags. |
| Google Gemini / Code Assist | google_oauth | Google OAuth PKCE / manual-code flow against Code Assist generateContent / streamGenerateContent. |
lash_providers_builtin::register_all() is the one-call aggregator the CLI and app hosts use to register all five factories with the global provider registry at process start.
openai
Direct OpenAI Responses. Posts to https://api.openai.com/v1/responses, keeps Responses reasoning replay, and maps shared ProviderOptions.cache_retention to prompt_cache_key derived from the Lash session id. Long retention adds prompt_cache_retention where the API supports it. No base_url accepted.
openai-compatible
Generic Chat Completions. Requires base_url. Converts LlmRequest to a messages array, emits Chat Completions tools, maps structured output to response_format, and preserves OpenRouter reasoning effort through the reasoning.effort request field. Used for OpenRouter, vLLM, Together, Groq, etc.
none
No markers emitted. Each request is treated as fresh.
short
Emits {"type":"ephemeral"} at the canonical breakpoints. Default 5-minute cache lifetime per Anthropic semantics.
long
Adds "ttl":"1h" to the ephemeral marker for longer-lived caching.
Breakpoints are placed at:
- The first system/developer text message.
- The last tool definition in the request.
- Any explicit
LlmContentBlock::Text.cache_breakpoint the runtime asks for.
When no explicit breakpoint is set, the provider falls back to the last user/assistant text content so prompt caching still works for sessions without explicit cache instrumentation.
The result: long RLM sessions get the same cache-hit rate as native multi-turn chats, even though Lashlang-driven reasoning regenerates a fresh prompt on every iteration.
pub struct LlmUsage {
pub input_tokens: u64,
pub output_tokens: u64,
pub cached_input_tokens: u64, // Reads from cache
pub reasoning_tokens: u64,
// …provider-specific extras flow through the extended trace
}
State
Per-handle config: API key, base URL, default model, options like thinking exposure.
Auth
Bearer header, OAuth refresh, device-code flow — whatever the vendor requires. Auth state is opaque to the runtime.
Readiness
Optional pre-flight check (token refresh, capability probe) that runs once per session.
Transport
The actual HTTP call. Translates LlmRequest to the wire format, streams the response, normalizes back to LlmResponse chunks.
Model policy
Maps user-facing model + variant names to provider-native ids, declares structured-output / tool-call / thinking capabilities per model.
Factory
Registers with the global provider registry at process start; ProviderHandle::new(components) assembles the five pieces into a handle the runtime can use.
See lash-provider-openai/src/lib.rs as the most general template — it handles both the direct Responses path and the generic Chat Completions path in one crate.