Skip to content

Security Model

Security is a first-class concern throughout oAI-Web. This page explains the layers of protection and why each exists.


Core principles

1. External input is data, never instructions

Emails, calendar events, web page content, and RSS items are always passed as tool results — structured data returned to the AI after it calls a tool. They are never injected into the system prompt.

This prevents a class of attacks where a malicious email contains hidden text like "IGNORE YOUR PREVIOUS INSTRUCTIONS". Even if that text passes through the prompt injection sanitiser, the AI sees it as tool output (opaque data), not as a system instruction.

2. Whitelists are database-managed

All permission lists are stored in PostgreSQL and managed through the Settings UI: - Email recipients (email_whitelist) — the agent can only send email to approved addresses - Web domains Tier 1 (web_whitelist) — always-allowed domains for web fetching - Filesystem sandbox (filesystem_whitelist) — directories the agent can read/write - Browser trusted domains (browser_approved_domains) — per-user, skips confirmation for interactive browser ops

None of these are hardcoded. A prompt injection attack cannot expand them (that would require the agent to call the whitelist tool, which itself can be removed from the agent's tool list).

3. Tiered web access

Tier 1: Domains in the whitelist — always allowed. Default seed includes Wikipedia, DuckDuckGo, yr.no, timeanddate.com, weather.met.no.

Tier 2: Any other domain — only allowed when: - The user explicitly initiates a web research request in the current session (the agent loop sets web_tier2_enabled = True based on message keywords), or - A scheduled task has declared web access in its allowed_tools

Tier 2 is session-scoped. Closing the chat and opening a new one resets it.

4. Confirmation before side effects

In interactive sessions, these operations require explicit user approval before executing: - Sending email - Writing or deleting files - Creating, updating, or deleting calendar events - Interactive browser operations (click, fill, press, select) on non-trusted domains

Scheduled tasks never ask for confirmation — their permissions are declared at creation time. It is structurally impossible for a scheduled agent to call a tool not in its declared allowed_tools.

5. Audit log is append-only

Every tool call is recorded in audit_log with timestamp, arguments, result summary, session ID, and user ID. The log is never auto-deleted (retention is configurable but must be explicitly set). Admins can view and filter the log at /audit.

6. No credentials in agent context

The server holds all API keys, passwords, and secrets. The agent receives only the results of operations. A prompt injection attack cannot cause the agent to print its own API key, because the agent never sees it.


Prompt injection sanitisation

security.py::sanitize_external_content() removes patterns that resemble jailbreak attempts from any text before it reaches the agent.

Always-active patterns:

<tool_use    <system    IGNORE PREVIOUS/ALL/ABOVE
FORGET PREVIOUS/ALL    NEW INSTRUCTIONS    YOU ARE NOW
ACT AS    [SYSTEM]    <<<...>>>

Extended patterns (enabled via system:security_sanitize_enhanced):

DISREGARD YOUR/ALL/PREVIOUS    PRETEND YOU ARE    YOUR NEW DIRECTIVE
STOP ... NEW TASK    [INST]    <|im_start|>    JAILBREAK    DAN MODE

Base64 blobs are detected and logged (but not removed — they may be legitimate email signatures).


Optional security layers (Settings → Security)

These can be enabled independently:

Canary tokens

A unique random token is injected into the system prompt with instructions not to repeat it. The agent loop monitors every tool argument. If the canary appears in an argument, the call is blocked and a Pushover alert is sent.

Why: A prompt injection attack that successfully takes over the agent might try to exfiltrate data through tool arguments (e.g. sending an email with the canary in the subject). This detects that.

Credential: system:security_canary_enabled

LLM content screening

Tool results from web, email, and calendar tools are screened by a secondary LLM call before being returned to the main agent. The screening model looks for prompt injection patterns that the regex filter might miss.

Modes: flag (attach warning but continue) or block (reject the tool result entirely).

Credentials: system:security_llm_screen_enabled, system:security_llm_screen_block

Output validation

Before each tool call, a secondary LLM call validates that the action is consistent with the user's original request in the session. For example, if the user asked to summarise emails but the agent tries to send one, this would be flagged.

Credential: system:security_output_validation_enabled

Content truncation

Limits the character count of tool results before they reach the agent: - Web pages: system:security_max_web_chars (default: 20,000) - Emails: system:security_max_email_chars (default: 6,000)

Prevents long documents from filling the context window and potentially hiding injections in their tail.

Credential: system:security_truncation_enabled

Enhanced sanitisation

Enables the extended prompt injection pattern list.

Credential: system:security_sanitize_enhanced


Credential encryption

All sensitive values are encrypted before being stored in PostgreSQL:

  • Algorithm: AES-256-GCM (provides authenticated encryption — detects tampering)
  • Key derivation: PBKDF2-SHA256, 480,000 iterations, static salt "aide-credential-store-v1"
  • Master key: Derived from DB_MASTER_PASSWORD environment variable
  • Storage format: base64(12-byte nonce + ciphertext + 16-byte GCM tag)

The master password is never stored anywhere. Losing it means losing access to all encrypted credentials.


Authentication

See Multi-User Auth for the full authentication documentation.

Summary: - Passwords: Argon2id (recommended by OWASP for modern password hashing) - Sessions: HMAC-signed cookies (SHA-256, 32-character signature) - MFA: TOTP (RFC 6238, compatible with Authenticator apps) - API keys: X-API-Key header, resolves to a synthetic admin user


Kill switch

POST /api/pause sets the system:paused credential to "1". The agent loop checks this: - At the start of every run - At the start of every iteration within a run

A paused agent mid-run will stop after the current tool call completes. The sidebar shows a red indicator when paused. Resume with POST /api/resume.


Rate limiting

  • Tool calls per run: effective_max_tool_calls (per-agent → DB setting → env default → 20)
  • Concurrent runs: system:max_concurrent_runs (default: 3, enforced by semaphore)
  • Scheduled runs per hour: system:max_autonomous_runs_per_hour (per-agent rate limiting in the runner)

Non-admin user isolation

Non-admin users: - Cannot access bash tool (never injected into their schemas) - Get BoundFilesystemTool scoped to their personal folder ({base}/{username}/) instead of the global whitelist - Can only see their own audit log entries - Cannot access admin settings (credentials, global whitelists, user management) - Agents owned by non-admin users never get the bash tool, even if it's in allowed_tools