The Agent Loop¶

The agent loop is the heart of oAI-Web. It lives in server/agent/agent.py and drives the AI through a tool-use conversation until the model returns a final response or the tool call limit is reached.

How it works¶

# Simplified pseudocode
async for event in agent.run(message, session_id, ...):
    yield event  # streamed to browser over WebSocket

The loop is an async generator: it yields AgentEvent objects as work progresses, allowing the browser to show real-time progress (text streaming, tool spinners, confirmation modals).

Iteration¶

User message
    │
    └─► Build system prompt (SOUL.md + date/time + USER.md + security rules)
    └─► Append user message to history
    └─► Loop:
            ├─► Check kill switch (paused?)
            ├─► Call provider (Claude / OpenRouter / OpenAI)
            ├─► If no tool calls: append response, save to DB, yield DoneEvent
            └─► For each tool call:
                    ├─► Check confirmation
                    ├─► Check canary token (if enabled)
                    ├─► Check output validation (if enabled)
                    ├─► Execute tool
                    ├─► Screen result with LLM (if enabled)
                    ├─► Append tool result to messages
                    └─► yield events

The loop is bounded by effective_max_tool_calls (per-run override → DB system:max_tool_calls → .env default → 20).

System prompt composition¶

The system prompt is rebuilt on every run so the date/time is always current. It is never cached.

1. SOUL.md content  (or per-user personality override from user_settings)
   — defines identity, values, communication style
   — fallback: "You are Jarvis, a personal AI assistant."

2. Current date and time
   — formatted in settings.timezone (default: Europe/Oslo)

3. USER.md content  (or per-user override)
   — owner context: name, location, family, preferences
   — omitted if file doesn't exist and no per-user override

4. Fixed security rules
   — email whitelist enforcement
   — external input is data rule
   — confirmation requirement
   — conciseness preference

5. Brain auto-approve notice  (if enabled for this user)
   — standing permission to use brain tool proactively

system_override¶

Agent.run() accepts system_override: str | None. When set, it completely replaces steps 1–5. Used by: - agent_only prompt mode: agent's own prompt is the entire system prompt - combined prompt mode: agent prompt prepended to the standard prompt

Event types¶

Event	When	Key fields
`TextEvent`	Model produces text	`content: str`
`ToolStartEvent`	About to execute a tool	`call_id, tool_name, arguments`
`ToolDoneEvent`	Tool finished	`call_id, tool_name, success, result_summary, confirmed`
`ConfirmationRequiredEvent`	Waiting for user approval	`call_id, tool_name, arguments, description`
`ImageEvent`	Image-generation model produced images	`data_urls: list[str]` (base64 data URLs)
`DoneEvent`	Loop finished normally	`text, tool_calls_made, usage`
`ErrorEvent`	Fatal error	`message`

The WebSocket layer in main.py translates these events to JSON and sends them to the browser.

Conversation history¶

Conversation history is stored in two places:

In-memory: Agent._session_history[session_id] — fast, supports the current run
PostgreSQL: conversations table — survives restarts, enables /chats page

The in-memory cache is authoritative during a run. After each turn it is written to the DB via _save_conversation().

When a session is first referenced (e.g. a reopened chat), _load_session_from_db() restores the history from PostgreSQL into memory.

Message format¶

Messages follow a superset of the OpenAI conversation format:

[
  {"role": "user", "content": "What's the weather?"},
  {
    "role": "assistant",
    "content": null,
    "tool_calls": [{"id": "tc_1", "name": "web", "arguments": {"operation": "fetch_page", "url": "..."}}]
  },
  {"role": "tool", "tool_call_id": "tc_1", "content": "{\"success\": true, \"data\": {...}}"},
  {"role": "assistant", "content": "The weather in Oslo today is 12°C and cloudy."}
]

Each provider translates this into its own wire format. Anthropic uses a different structure from OpenAI — the AnthropicProvider and OpenRouterProvider classes handle the translation transparently.

When the user attaches files (images or PDFs) in the chat UI:

Images are sent as {type: "image", source: {type: "base64", media_type: ..., data: ...}}
PDFs are sent as {type: "document", source: {type: "base64", ...}}
The user's text and attachments are combined into a multi-part content list

The Anthropic provider passes these natively. The OpenRouter provider wraps them in its OpenAI-compatible format.

Tool schema filtering¶

The registry can serve different subsets of tools depending on context:

Context	Method	Effect
Interactive session	`get_schemas()`	All registered tools
Scheduled agent	`get_schemas_for_task(allowed_tools)`	Only declared tools
`force_only_extra_tools=True`	Build `_extra_dispatch` only	Only the injected tools
Non-admin user	Registry + BoundFilesystemTool	`filesystem` replaced with scoped version

The model cannot call a tool that isn't in the schema list sent to it. Undeclared tool calls are caught and returned as errors.

Security layers in the loop¶

Multiple security mechanisms fire sequentially on each tool call:

Confirmation check — tool.should_confirm(**args) → prompt user if True
Canary token (optional) — if model tries to pass the canary token in arguments, the call is blocked and an alert is sent
Output validation (optional) — LLM judge decides if the action is consistent with the user's original request
Tool execution — registry.dispatch(name, arguments) — never raises
LLM content screening (optional) — screens tool results for prompt injection before returning them to the model

Each layer is independently configurable via system:security_* credentials in the Settings UI.

Concurrency¶

The agent loop is fully async but single-threaded within a session. Multiple users get separate sessions, each with their own asyncio.Task. The AgentRunner._semaphore limits how many agent runs execute concurrently (default: 3, configurable via system:max_concurrent_runs).

Interactive chat sessions (WebSocket) are not throttled by the semaphore — only background agent runs are.