Chat completion via Mastio
Scope of this page: how to make an LLM completion through the Mastio AI gateway, with and without tool calls. Streaming is covered at the end.
Prerequisites:
- An enrolled agent with the three identity files on disk and a working
from_identity_dir(...)+login_via_proxy_with_local_key(). If you don’t have that yet, do SDK quickstart first — this page picks up where that one ends. - A provider key configured on the Mastio side:
MCP_PROXY_ANTHROPIC_API_KEYinproxy.envfor Anthropic upstream, or a reachable Ollama daemon for local models. Without that, every call returns503 provider_key_missing.
1. The minimal call
OpenAI ChatCompletion request shape. Mastio dispatches to the configured upstream through a native adapter (official Anthropic / OpenAI SDKs, raw httpx for Ollama):
response = client.chat_completion({
"model": "anthropic/claude-haiku-4-5-20251001",
"messages": [
{"role": "user", "content": "Summarise the EU AI Act Art. 12 in two sentences."},
],
"max_tokens": 200,
})
print(response["choices"][0]["message"]["content"])
print("trace:", response.get("cullis_trace_id"))
The response is the upstream provider’s OpenAI-compatible reply plus a cullis_trace_id injected by Mastio. The trace id matches an entry in the Mastio audit chain — useful when debugging or for compliance lookups.
Provider matrix
| Model string | Upstream | Status |
|---|---|---|
anthropic/claude-opus-4-7 | Anthropic API | live |
anthropic/claude-sonnet-4-6 | Anthropic API | live |
anthropic/claude-haiku-4-5-20251001 | Anthropic API | live |
ollama_chat/<model-name> | Local Ollama (e.g. ollama_chat/qwen2.5:7b) | live |
openai/gpt-4-*, gemini/* | OpenAI / Google | returns 501 not_implemented — roadmap |
Use the literal model string. The provider prefix tells Mastio which upstream to route to.
Ollama prefix gotcha: use
ollama_chat/<name>, NOTollama/<name>. The latter routes through the legacy/api/generateendpoint that silently drops themessagesarray, returning a 200 with empty content. Alwaysollama_chat/.
2. With tools (the agent loop)
A real agent doesn’t just chat — it loops: chat, model emits a tool_call, agent dispatches the tool, feeds the result back, model responds. The pattern:
# List MCP tools the agent is bound to (server-side filtered by capability gate)
raw_tools = client.list_mcp_tools()
# Convert MCP shape to OpenAI tool-use shape
tools = [
{
"type": "function",
"function": {
"name": t["name"],
"description": t.get("description", ""),
"parameters": t.get("inputSchema", {"type": "object"}),
},
}
for t in raw_tools
]
messages = [
{"role": "system", "content": "You are a KYC screener. Use tools to verify documents."},
{"role": "user", "content": "Process case KYC-2026-001 for document PASSPORT-IT-MR-871234."},
]
for iteration in range(8): # cap iterations to avoid infinite loops on misbehaving models
response = client.chat_completion({
"model": "anthropic/claude-haiku-4-5-20251001",
"messages": messages,
"tools": tools,
})
msg = response["choices"][0]["message"]
messages.append(msg)
tool_calls = msg.get("tool_calls") or []
if not tool_calls:
# Model produced final answer
print("Final:", msg.get("content"))
break
# Dispatch each tool call
for call in tool_calls:
name = call["function"]["name"]
args = json.loads(call["function"]["arguments"])
result = client.call_mcp_tool(name, args)
messages.append({
"role": "tool",
"tool_call_id": call["id"],
"content": json.dumps(result),
})
Two things Mastio handles for you in this loop:
- Authorization. Every
call_mcp_tool(name, args)runs against the capability gate. If the agent’s capabilities don’t include the one the tool requires, Mastio returns403before the tool’s MCP server is ever contacted. - Audit. Every chat completion + every tool call lands in
local_auditwith the samecullis_trace_id, so the entire loop is reconstructable for compliance.
Reference implementation: agent_kyc_screener/main_stack.py shows the same loop with system prompt loading, decision parsing, and trace-id capture.
3. Streaming
For long completions you want token-by-token output to the user, not a 30-second blocked call. chat_completion_stream returns an iterator of SSE frames:
for frame in client.chat_completion_stream({
"model": "anthropic/claude-haiku-4-5-20251001",
"messages": [{"role": "user", "content": "Explain DPoP in three paragraphs."}],
}):
# Each frame is a raw SSE string, e.g. "data: {...}\n\n"
print(frame, end="", flush=True)
SSE frames include:
data: {...delta...}\n\n— incremental content deltas (OpenAI streaming shape)event: tool_call_start\ndata: {...}\n\n— Mastio-emitted marker when a tool call beginsevent: cullis_audit\ndata: {...}\n\n— Mastio-emitted Cullis Audit Envelope sidecar (matches the row written tolocal_audit)data: [DONE]\n\n— terminal frame
Parse the frames according to the SSE format (RFC describing SSE). Most agent frameworks (Anthropic SDK, OpenAI SDK, LangChain) have an SSE parser you can plug in — pass the raw bytes from chat_completion_stream to it.
4. Common errors
| Status | Cause | Fix |
|---|---|---|
401 Unauthorized | Cert mismatch, DPoP key not loaded, or token expired | Verify from_identity_dir got a dpop_key_path. The SDK auto-relogins once on 401; if you still hit it, the cert’s thumbprint isn’t pinned in Mastio’s internal_agents.dpop_jkt — re-enroll. |
403 Forbidden — capability denied | Tool call requires a capability the agent doesn’t have | Update the agent’s capabilities at enrollment, or pick a different tool. |
503 provider_key_missing | Upstream provider key not set in proxy.env | Set MCP_PROXY_ANTHROPIC_API_KEY (Anthropic) or ensure Ollama daemon is reachable from the Mastio container. |
501 not_implemented | Upstream provider not wired (OpenAI, Gemini, etc.) | Roadmap. Use Anthropic or Ollama for now. |
502 Bad Gateway | Upstream provider returned an error or timed out | Check Mastio logs (docker compose -p cullis-mastio logs mcp-proxy) for the upstream status. Usually transient. |
504 Gateway Timeout | Long completion exceeded Mastio’s upstream timeout (default 60s) | Bump MCP_PROXY_AI_GATEWAY_TIMEOUT in proxy.env, or switch to streaming. |
The full HTTP response body is preserved on the exception (httpx.HTTPStatusError.response), so you can introspect what Mastio gave you when debugging.
5. Observability
Every chat completion writes one row to local_audit with:
event_type = mcp.llm_completionagent_id= your agentdetails.model,details.tokens_in,details.tokens_out,details.upstream_providercullis_trace_id— the same id you got back in the response, useful for joining client-side logs to the server-side audit chain
To inspect: dashboard https://mastio.example.com/proxy/audit or offline export via Audit export.
What’s next
- SDK quickstart — enrollment + auth, the prerequisites of this page
- MCP tools via Mastio — deeper dive on
list_mcp_tools+call_mcp_tooloutside the chat loop (deterministic task runners, ETL, ops scripts) - Audit export — export the chain that records these calls
- Mastio on Docker — stand up a local Mastio with Ollama wired for development