lash guide

lash/api

Most agent stacks treat the LLM as the runtime and stitch state around it — a database for memory, a queue for retries, a sandbox for code. lash inverts that. The runtime is the durable end of the pair; the LLM is the variable call. Your app owns the outer boundaries — storage, auth, transport, product state. lash owns the turn — model calls, modes, tools, plugins, semantic stream events, usage, and terminal outcomes.

Runtime Alongside The Model

The unit of work is a turn. A turn is a RuntimeCommit against a durable SessionGraph — either the whole turn lands or none of it does. Model output is one event class flowing through that commit, not the thing holding state between events. Two execution modes — standard for native tool calls and rlm for Lashlang programs in a no-syscall VM — share the same containment, trace shape, and commit unit.

Durable session graph
Conversation records, tool events, mode events, and plugin nodes live in one SessionGraph per session. SessionReadView and ChronologicalProjection read it; nothing else owns persistence.
Per-turn atomic commit
RuntimeCommit writes graph delta, checkpoint blobs, usage deltas, and head revision in one SQLite transaction with optimistic CAS on expected_head_revision. Partial turn = no commit.
Typed plugin capabilities
Tools see a named ToolContext surface — tasks(), sessions(), direct_completion, tool_catalog, snapshot reads — not a general host handle. There is no ToolContext::host() escape hatch.
Sandboxed code load, ready
RLM mode runs model-emitted Lashlang programs in a VM with no filesystem, process, or network surface. Every effect crosses ToolHost. Use it when the model should compose multiple tool calls per turn instead of one.
Subagent capability boundaries
Capability::resolve(parent_policy) returns a constrained SessionSpec for the child. Interactive-only tools are stripped from every subagent surface regardless of capability.
Semantic event stream
Identity-bearing TurnActivity items: assistant prose deltas, reasoning deltas, tool start/complete pairs with correlation ids, code-block start/complete, terminal SubmittedValue/ToolValue, per-call and rolling usage. Apps fold this directly into product state.
Tracing as a first-class sink
Every provider call across every session emits TraceRecords through TraceSink implementations. JSONL by default, OTel optional.
Snapshot and restore seams
Plugins, tool state, and the Lashlang VM each persist through versioned snapshot writers so a parked session resumes intact across process restarts.

What sits outside this crate, on purpose: tenancy boundaries, retention and lifecycle for long-lived artifacts, discovery of agent-authored procedures, a shared coherent image across many sessions. lash is the kernel and runtime; the platform-shaped pieces belong to whatever embeds it.

Core Shape

Build one shared LashCore for app-wide configuration, then open one LashSession per app conversation or task.

lash vs lash-core boundary
use std::sync::Arc;

use lash::{tools::*, LashCore, ModeId, ModePreset, PluginStack, TurnEvent, TurnInput};

let store_factory = Arc::new(lash_sqlite_store::SqliteSessionStoreFactory::new("lash-sessions"));

let core = LashCore::builder()
    .install_mode(ModePreset::standard())
    .install_mode(ModePreset::rlm())
    .default_mode(ModeId::rlm())
    .provider(provider)
    .model("anthropic/claude-sonnet-4.6", None)
    .max_context_tokens(200_000)
    .plugins(PluginStack::runtime())
    .tools(Arc::new(AppTools) as Arc<dyn ToolProvider>)
    .store_factory(store_factory)
    .build()?;

let session = core.session("chat-123").rlm().open().await?;
let result = session.turn(TurnInput::text("Use the app tools.")).run().await?;
let assistant_text: String = result.activities.iter().filter_map(|activity| match &activity.event {
    TurnEvent::AssistantProseDelta { text } => Some(text.as_str()),
    _ => None,
}).collect();
println!("{assistant_text}");
LashCore
Cloneable shared configuration: provider, model, installed modes, tool providers, plugin factories, store factory, attachment store, and tracing. Runtime internals such as residency and termination policy live behind .advanced().
ModePreset
Installs execution modes. Use ModePreset::standard() for native provider tool calls, ModePreset::rlm() for Lashlang-driven RLM turns, or install both and choose the default with default_mode.
SessionSpec
Reusable public configuration overlay for provider, model/variant, execution mode, max context tokens, max turns, and prompt layer. Root cores use SessionSpec::new(); child sessions and subagents usually use SessionSpec::inherit().
PluginStack
Ordered plugin factory list. LashCore::standard() and LashCore::rlm() include PluginStack::runtime(); a raw LashCore::builder() stays explicit and needs modes and plugins installed by the host.
LashSession
One app conversation or task. Sessions wrap a parked/resumable runtime, can use a per-session store, and expose turn(TurnInput), run(TurnInput), read_view(), and lower-level control groups through control().
TurnBuilder
Per-turn configuration: cancellation, mode options, typed plugin input, and RLM-projected bindings. Call .stream(&sink) for live events or .run() for a collected ordered activity log.

Session Specs And Plugins

The root builder methods are ergonomic wrappers over one SessionSpec. Use .session_spec(...) when a host already has a complete spec object, and use the same type for parent-relative child session configuration.

Root defaults

use lash::{plugins::PluginFactory, LashCore, PluginStack, SessionSpec};

let root_spec = SessionSpec::new()
    .provider(provider)
    .model("gpt-5.4", None)
    .max_context_tokens(200_000);

let core = LashCore::rlm()
    .session_spec(root_spec)
    .configure_plugins(|plugins| {
        plugins.push(Arc::new(AppPluginFactory) as Arc<dyn PluginFactory>);
    })
    .build()?;

Explicit stacks

let plugins = PluginStack::runtime().configure(|plugins| {
    plugins.replace(Arc::new(CustomBudgetPlugin) as Arc<dyn PluginFactory>);
    plugins.push(Arc::new(AppPluginFactory) as Arc<dyn PluginFactory>);
});

let core = LashCore::builder()
    .install_mode(ModePreset::rlm())
    .default_mode(ModeId::rlm())
    .session_spec(root_spec)
    .plugins(plugins)
    .build()?;

.plugin(...) appends one factory to the current stack, .plugins(...) replaces the full stack, and .configure_plugins(...) mutates the current stack in place. That gives hosts a default runtime set while still allowing precise removal, replacement, or insertion.

Run A Turn

TurnBuilder::run() is the smallest path. It returns a TurnOutput with the terminal TurnResult and the ordered Vec<TurnActivity> emitted during the turn.

Collected result

let collected = session
    .turn(TurnInput::text("Summarize this task."))
    .run()
    .await?;

let visible_answer: String = collected.activities.iter().filter_map(|activity| match &activity.event {
    TurnEvent::AssistantProseDelta { text } => Some(text.as_str()),
    _ => None,
}).collect();
let parent_usage = collected.result.usage;          // parent's own LLM tokens
let children = &collected.result.children_usage;    // per-(source, model) child entries
let total = collected.result.total_usage();         // parent + children
let outcome = collected.result.outcome;

Live stream

let ui_sink = Arc::new(AppEvents::new(tx));

let turn = session
    .turn(TurnInput::text(user_text))
    .stream(ui_sink.as_ref())
    .await?;

persist(app_turn_state.assistant_text(), turn.total_usage())?;

Apps own their projection policy. Fold TurnActivity directly when persisting assistant prose, terminal values, tool summaries, or timelines. Treat TurnActivity.correlation_id as the stable identity for multi-phase UI rows: start events insert rows, completion events update those same rows.

Input Boundary

TurnInput is semantic core input: text plus image references backed by the attachment store. Filesystem syntax is a host concern.

Text

Pass ordinary user text with TurnInput::text(...) or explicit InputItem::Text values. If your host supports @path, resolve and validate it in the host, then include whatever marker text you want the model to see.

Images

Use InputItem::ImageRef with a matching entry in TurnInput::image_blobs. Lash stores the bytes as typed attachments and resolves them when building provider requests.

Semantic Events

The app stream is semantic and identity-bearing. Assistant prose arrives as deltas, reasoning arrives as deltas, multi-phase tool/code activity carries a correlation id, and terminal values authored by submit or tool controls arrive as SubmittedValue or ToolValue.

TurnActivity sink
use async_trait::async_trait;
use lash::{TurnActivity, TurnActivitySink, TurnEvent};

struct AppEvents {
    tx: AppUiTx,
    turn_state: std::sync::Mutex<TurnUiState>,
}

#[derive(Default)]
struct TurnUiState {
    reasoning: Option<UiRowId>,
    tools: std::collections::HashMap<String, UiRowId>,
    code: Option<UiRowId>,
}

#[async_trait]
impl TurnActivitySink for AppEvents {
    async fn emit(&self, activity: TurnActivity) {
        let correlation_id = activity.correlation_id.0.clone();
        match activity.event {
            TurnEvent::AssistantProseDelta { text } => {
                append_live_text(text).await;
            }
            TurnEvent::ReasoningDelta { text } => {
                let row = self.turn_state.lock().unwrap().reasoning.clone();
                let row = upsert_reasoning_row(row, text).await;
                self.turn_state.lock().unwrap().reasoning = Some(row);
            }
            TurnEvent::ToolCallStarted { name, args, .. } => {
                let row = insert_tool_row(name, args).await;
                self.turn_state
                    .lock()
                    .unwrap()
                    .tools
                    .insert(correlation_id, row);
            }
            TurnEvent::ToolCallCompleted { name, result, success, .. } => {
                let row = self.turn_state.lock().unwrap().tools.remove(&correlation_id);
                update_or_insert_tool_row(row, name, result, success).await;
            }
            TurnEvent::CodeBlockStarted { language, code } => {
                let row = insert_code_row(language, code).await;
                self.turn_state.lock().unwrap().code = Some(row);
            }
            TurnEvent::CodeBlockCompleted { language, output, error, success, .. } => {
                let row = self.turn_state.lock().unwrap().code.take();
                update_or_insert_code_row(row, language, output, error, success).await;
            }
            TurnEvent::SubmittedValue { value } => {
                append_live_text(render_terminal_value(&value)).await;
            }
            TurnEvent::ToolValue { tool_name, value } => {
                append_live_text(render_terminal_value(&value)).await;
                record_terminal_tool(tool_name).await;
            }
            TurnEvent::Usage { usage, cumulative, .. } => {
                update_usage(usage, cumulative).await;
            }
            TurnEvent::ChildUsage { source, usage, cumulative, .. } => {
                update_child_usage(source, usage, cumulative).await;
            }
            TurnEvent::Error { .. }
            | _ => {}
        }
    }
}

fn render_terminal_value(value: &serde_json::Value) -> String {
    match value {
        serde_json::Value::Null => String::new(),
        serde_json::Value::String(text) => text.clone(),
        other => serde_json::to_string_pretty(other).unwrap_or_else(|_| other.to_string()),
    }
}
Use event identity, not duplicate detection.

ToolCallStarted and ToolCallCompleted describe the same logical row when their correlation_id matches; code-block events work the same way. TurnEvent::SubmittedValue and TurnEvent::ToolValue mean “a new terminal value was authored by a control path.” They are not emitted for a normal assistant prose finish because that prose already streamed as AssistantProseDelta.

Token Usage

Four channels surface token usage, finest granularity to coarsest. Pick the one that matches what you want to display or persist.

TraceSink
Every provider call across every session in the runtime. Right for billing, audit, off-line analysis. Heavier than necessary for plain totals.
TurnEvent::Usage / TurnEvent::ChildUsage
Live during a turn, one event per LLM iteration. Usage is the parent's own model call; ChildUsage carries session_id and source so a UI can group child traffic (subagent, compaction, observer). Right for live counters.
TurnResult.usage / TurnResult.children_usage
Per-turn snapshot at completion. usage is parent-only; children_usage is a per-(source, model) breakdown for any child sessions that ran during the turn. TurnResult::total_usage() sums both.
session.usage_report()SessionUsageReport
Aggregate across the whole session, broken down by source × model. Right for dashboards and "session so far."

The full re-export and well-known source label constants live in lash::usage.

RLM And Submit

RLM turns can require submit or allow direct prose. With SubmitRequired, a submit statement validates against the configured schema when present, finishes the turn as TurnFinish::SubmittedValue, and emits a SubmittedValue semantic event. With ProseOrSubmit, a no-code prose answer finishes as TurnFinish::AssistantMessage and emits no terminal-value event.

Per-turn RLM options

let submitted = session
    .turn(TurnInput::text("Move on the board."))
    .require_submit()?
    .stream(&sink)
    .await?;

let prose_or_submit = session
    .turn(TurnInput::text("Answer directly if no code is needed."))
    .allow_prose_or_submit()?
    .run()
    .await?;

Outcome shape

match result.outcome {
    lash::TurnOutcome::Finished(
        lash::TurnFinish::SubmittedValue { value }
    ) => {
        // Same value already arrived as TurnEvent::SubmittedValue.
        persist_typed_value(value)?;
    }
    lash::TurnOutcome::Finished(
        lash::TurnFinish::AssistantMessage { text }
    ) => persist_text(text)?,
    other => handle_other_outcome(other)?,
}

Prompt Templates And Slots

lash exposes the same prompt model as the runtime: one effective template chooses the layout, while slot contributions supply app, session, and turn-specific content. Core, session, and turn layers inherit by default; a lower layer can replace or clear one slot without rebuilding the whole template.

Template layout is separate from slot content
use lash::{
    PromptBuiltin, PromptContribution, PromptSlot, PromptTemplate,
    PromptTemplateEntry, PromptTemplateSection,
};

let template = PromptTemplate::new(vec![
    PromptTemplateSection::untitled(vec![
        PromptTemplateEntry::builtin(PromptBuiltin::MainAgentIntro),
        PromptTemplateEntry::slot(PromptSlot::Intro),
    ]),
    PromptTemplateSection::titled(
        "Guidance",
        vec![PromptTemplateEntry::slot(PromptSlot::Guidance)],
    ),
]);

let core = lash::LashCore::standard()
    .provider(provider)
    .model("gpt-5.4", None)
    .max_context_tokens(200_000)
    .prompt_template(template)
    .prompt_contribution(PromptContribution::guidance(
        "App",
        "Answer as the host application assistant.",
    ))
    .build()?;

let session = core
    .session("customer-42")
    .replace_prompt_slot(
        PromptSlot::Guidance,
        [PromptContribution::guidance(
            "Tenant",
            "Use the tenant's support policy.",
        )],
    )
    .open()
    .await?;

let result = session
    .turn(TurnInput::text("Draft the response."))
    .prompt_contribution(PromptContribution::guidance(
        "Turn",
        "Keep this reply under 120 words.",
    ))
    .run()
    .await?;

Typed Plugin Input

Plugin bindings let an app provide strongly typed session configuration and per-turn input. The binding activates an ordinary core plugin factory for that session; the resulting session plugin registers prompt contributions, tools, and other capabilities through PluginRegistrar.

Plugin and tool pattern
#[derive(Clone, Debug)]
struct BoardPlugin;

#[derive(Clone, Debug)]
struct BoardConfig;

#[derive(Clone, Debug)]
struct BoardTurnInput {
    board: BoardState,
}

impl lash::PluginBinding for BoardPlugin {
    const ID: &'static str = "board";
    type SessionConfig = BoardConfig;
    type Input = BoardTurnInput;

    fn factory(_: &Self::SessionConfig) -> Arc<dyn lash::plugins::PluginFactory> {
        Arc::new(BoardPluginFactory)
    }

    fn requires_turn_input(_: &Self::SessionConfig) -> bool {
        true
    }
}

impl lash::plugins::SessionPlugin for BoardSessionPlugin {
    fn id(&self) -> &'static str { BoardPlugin::ID }

    fn register(&self, reg: &mut lash::plugins::PluginRegistrar) -> Result<(), lash::plugins::PluginError> {
        reg.prompt().contribute(Arc::new(|ctx| {
            Box::pin(async move {
                let Some(input) = ctx.turn_context.plugin_input::<BoardTurnInput>(BoardPlugin::ID) else {
                    return Ok(Vec::new());
                };
                Ok(vec![lash::PromptContribution::environment(
                    "Board",
                    format_board(&input.board),
                )])
            })
        }));
        reg.tools().provider(Arc::new(BoardTools))
    }
}

#[async_trait::async_trait]
impl lash::tools::ToolProvider for BoardTools {
    fn definitions(&self) -> Vec<lash::tools::ToolDefinition> {
        vec![/* read_board, play_move, app tools */]
    }

    async fn execute(&self, call: lash::tools::ToolCall<'_>) -> lash::tools::ToolResult {
        let Some(input) = call.context.plugin_input::<BoardTurnInput>(BoardPlugin::ID) else {
            return lash::tools::ToolResult::err_fmt("missing board input");
        };
        run_board_tool(call.name, call.args, &input.board)
    }
}

Human input follows the same rule: expose it as a host-owned tool that waits inside its tool implementation and returns a normal tool result. App streams then see ordinary tool start/completion events; there is no separate runtime prompt event to handle.

Plugin crates should export a domain extension trait that wraps the generic input primitive:

trait BoardTurnExt {
    fn with_board(self, board: BoardState) -> Self;
}

impl BoardTurnExt for lash::TurnBuilder {
    fn with_board(self, board: BoardState) -> Self {
        self.with_plugin_input::<BoardPlugin>(BoardTurnInput { board })
    }
}

Install typed plugins on the session and use the extension method before each run:

let core = LashCore::rlm()
    .provider(provider)
    .model(model, None)
    .max_context_tokens(200_000)
    .build()?;

let session = core
    .session(chat_id)
    .rlm()
    .plugin::<BoardPlugin>(BoardConfig)
    .open()
    .await?;

let result = session
    .turn(TurnInput::text("Play one move."))
    .with_board(board)
    .require_submit()?
    .collect_with(&sink)
    .await?;

App Storage

Keep product storage separate from runtime storage. Your app database can store chats, users, messages, and UI state; Lash session persistence stores the runtime graph and checkpoints through a SessionStoreFactory.

App state

Own chat tables, account ids, frontend board state, request auth, and transport formats. The example app stores chat messages in SQLite and streams newline-delimited JSON to the browser.

Runtime state

Pass an explicit store factory such as lash_sqlite_store::SqliteSessionStoreFactory::new(...) to LashCoreBuilder::store_factory, or pass a concrete store to SessionBuilder::store, when sessions need durable runtime state across process restarts.

Subagents

Subagents are configured with the same SessionSpec shape as root sessions. The factory owns the capability registry and host; child policy resolution always starts from the live parent snapshot.

use std::sync::Arc;
use lash::{plugins::PluginFactory, SessionSpec};
use lash_subagents::{default_registry, SubagentsPluginFactory};

let registry = Arc::new(default_registry(&tier_models));
let host = Arc::new(AppSubagentHost::new(child_store_factory));

let subagents = SubagentsPluginFactory::new(registry, host)
    .with_session_spec(SessionSpec::inherit().max_turns(8));

let core = LashCore::rlm()
    .provider(provider)
    .model(model, None)
    .max_context_tokens(200_000)
    .plugin(Arc::new(subagents) as Arc<dyn PluginFactory>)
    .build()?;

Capability implementations return SessionSpec overlays. StaticCapability is for exact child authority, while TierCapability implements the built-in explore and peer model/mode selection. Tool authors should not construct SessionPolicy for child configuration; it remains the resolved runtime artifact.

MCP Servers

MCP (Model Context Protocol) servers attach via the lash-plugin-mcp crate, which wraps the official rmcp SDK and supports the three standard transports — stdio (child process), streamable_http, and sse. The plugin owns a shared connection pool: every session built from the same LashCore reuses the same client per server, so stdio servers spawn once per process instead of per session.

use std::collections::BTreeMap;
use lash_plugin_mcp::{McpPluginFactory, McpServerConfig};

let mut servers = BTreeMap::new();
servers.insert(
    "docs".to_string(),
    McpServerConfig::stdio("uvx", vec!["mcp-server-docs".into()]),
);
servers.insert(
    "web".to_string(),
    McpServerConfig::streamable_http("https://mcp.example.com/rpc"),
);

let mcp = McpPluginFactory::new(servers).await?;

let core = LashCore::rlm()
    .provider(provider)
    .model(model, None)
    .max_context_tokens(200_000)
    .plugin(std::sync::Arc::new(mcp))
    .build()?;

Tools are surfaced under mcp__<server>__<tool> names with their original input and output schemas preserved. The factory's attach_server / detach_server methods let hosts add or remove servers at runtime without rebuilding the core.

Advanced Runtime

The normal builder exposes app-facing setup. Use .advanced() only for runtime-host concerns such as custom plugin hosts, shared task executors, residency policy, imported RuntimeCoreConfig, and termination policy.

let core = LashCore::rlm()
    .provider(provider)
    .model("anthropic/claude-sonnet-4.6", None)
    .max_context_tokens(200_000)
    .store_factory(store_factory)
    .advanced()
    .residency(Residency::ActivePathOnly)
    .build()?;

Turn streaming is semantic by default: TurnBuilder::stream emits TurnActivity items and resolves with a rich TurnResult. Raw runtime telemetry belongs in tracing and lower-level runtime debugging, not the normal lash API surface.

Complete Example

The runnable browser example shows the full pattern: app-owned chat database, RLM mode, typed plugin input, tools that read turn input, semantic stream events, terminal output rendering, and app-owned persistence folded from the stream.

OPENROUTER_API_KEY=... cargo run -p agent-service
# then open http://127.0.0.1:3000

Source: examples/agent-service. The dedicated walkthrough is Agent Service.