Security Model

Nestor is designed with a security-first architecture. Every layer, from the Rust core to the Docker sandbox, is built to prevent AI agents from causing harm.

Security Layers

Nestor applies defense in depth with multiple security layers:

Rust Security Core — Low-level protection via N-API bindings
Docker Sandbox — Process isolation with minimal capabilities
Guardrails — Configurable rules for tool access and approvals
Circuit Breaker — Automatic protection against cascading failures
StuckDetector — Detects and breaks agent loops
Trust Scoring — Behavioral monitoring and grading
Cost Budgets — Hard limits on spending per session/day
Secret Redaction — Automatic detection and masking of sensitive data

Rust Security Core

The nestor-core crate is compiled to a native Node.js addon via N-API. It provides security primitives that are impossible to bypass from JavaScript:

SSRF Protection

All outgoing HTTP requests are validated against SSRF attacks:

DNS pinning to prevent rebinding attacks
Blocking of private IP ranges (127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16)
Blocking of link-local and metadata endpoints (169.254.169.254)
Protocol allowlisting (only http/https)

Path Traversal Prevention

File operations are validated to prevent escaping the working directory:

Canonical path resolution before any file I/O
Blocking of .. traversal sequences
Symlink resolution and validation
Configurable allowlist/blocklist for file patterns

Approval System

Sensitive operations require cryptographic approval tokens that cannot be forged by the agent.

Homoglyph Detection

New in v3.4.0, the Rust core detects Unicode homoglyph attacks where visually similar characters (e.g., Cyrillic "a" vs Latin "a") are used to disguise malicious URLs, filenames, or commands. All string inputs are normalized and validated before processing.

Skill Scanner

Before installing any skill, the Skill Scanner performs static analysis to detect potentially dangerous patterns:

Shell injection vectors in tool definitions
Unauthorized network access patterns
File system escape attempts
Obfuscated code or encoded payloads

Safe Regex

All user-provided regex patterns (custom redaction, guardrails) are validated against ReDoS (Regular Expression Denial of Service) attacks. The Rust core enforces maximum execution time and rejects catastrophic backtracking patterns.

Docker Sandbox

When Docker is available, agent commands run inside an isolated container:

# Docker sandbox configuration
sandbox:
  enabled: true
  image: nestor-sandbox:latest
  capabilities:
    drop: [ALL]           # drop all Linux capabilities
  filesystem:
    read_only: true       # read-only root filesystem
    tmpfs: /tmp           # writable temp directory
    bind_mounts:
      - src: ./src
        dst: /workspace/src
        read_only: false  # agent can write to src/
  network: none           # no network access by default
  memory_limit: 512m
  cpu_limit: 1.0
  timeout: 300            # kill after 5 minutes

Important: The Docker sandbox is optional but strongly recommended for production use. Without it, agents execute commands directly on the host system with guardrails as the only protection.

Sandbox Modes

Mode	Network	Filesystem	Use Case
`strict`	None	Read-only	Untrusted agents, security review
`standard`	None	Working dir writable	Code editing, file manipulation
`relaxed`	Allowed	Working dir writable	Web search, API calls

Guardrails

Guardrails are configurable rules that constrain agent behavior:

Tool-Level Guardrails

guardrails:
  # Require human approval for these tools
  require_approval:
    - file_write
    - shell_exec

  # Block specific shell commands
  blocked_commands:
    - rm -rf
    - sudo
    - curl | sh
    - chmod 777

  # Restrict file write patterns
  file_restrictions:
    blocked_paths:
      - .env
      - .nestor/config.yaml
      - node_modules/
    blocked_extensions:
      - .exe
      - .sh

Behavioral Guardrails

guardrails:
  # Limit the agent loop
  max_iterations: 25
  max_tokens_per_turn: 8192

  # Dry-run mode: preview all actions
  dry_run: false

  # Auto-approve safe operations
  auto_approve:
    - file_read
    - web_search

Guardrails CRUD API

Nestor v3.4.0 introduces a full CRUD API for managing guardrails at runtime, without restarting the server:

CLI Commands

# List all guardrails for an agent
npx nestor-sh guardrail list --agent coder

# Add a new guardrail rule
npx nestor-sh guardrail add --agent coder \
  --type blocked_command --value "docker rm"

# Remove a guardrail rule
npx nestor-sh guardrail remove --agent coder \
  --type blocked_command --value "docker rm"

# Update guardrail settings
npx nestor-sh guardrail set --agent coder \
  --key max_iterations --value 50

Studio API

Guardrails can also be managed via the Studio dashboard REST API:

# GET /api/agents/:name/guardrails
# POST /api/agents/:name/guardrails
# PUT /api/agents/:name/guardrails/:id
# DELETE /api/agents/:name/guardrails/:id

Circuit Breaker

The circuit breaker protects against cascading failures when LLM providers are down or rate-limited:

How It Works

Closed — Normal operation. Requests pass through to the LLM provider
Open — Provider is failing. Requests are immediately rejected and fallback chain activates
Half-Open — Testing recovery. A limited number of requests are allowed through

Configuration

# Circuit breaker settings
circuit_breaker:
  failure_threshold: 5     # open after 5 consecutive failures
  reset_timeout: 60000     # try again after 60 seconds
  half_open_max: 2         # allow 2 test requests in half-open

When a provider circuit opens, Nestor automatically routes to the next provider in the fallback chain (e.g., Claude fails, fall back to GPT-4o, then Gemini, then Ollama).

StuckDetector

The StuckDetector monitors agent behavior and intervenes when an agent enters a loop or makes no progress:

Detection Patterns

Repetition loop — Agent repeats the same tool call 3+ times with identical parameters
Error loop — Same error occurs on consecutive iterations
No progress — No new findings, no new tool calls for N iterations
Token spiral — Context grows without producing useful output

Recovery Actions

When stuck behavior is detected, Nestor can:

Inject a system message asking the agent to change strategy
Reset the conversation context to a previous checkpoint
Switch to a different LLM provider
Gracefully terminate with a partial report

# StuckDetector configuration
stuck_detector:
  enabled: true
  max_repeated_calls: 3    # detect after 3 identical calls
  max_error_streak: 5      # detect after 5 consecutive errors
  idle_iterations: 10      # detect after 10 iterations with no progress
  action: inject_hint      # inject_hint | reset | switch_llm | terminate

Trust Score System

The trust score is a composite metric (0-100, grade A-F) computed from an agent's execution history:

Score Components

Component	Weight	What It Measures
Accuracy	35%	Correctness of outputs and tool usage
Safety	30%	Guardrail compliance, no blocked actions attempted
Efficiency	20%	Token usage relative to task complexity
Reliability	15%	Consistency across similar tasks

Trust-Based Permissions

Agents can be granted or restricted permissions based on their trust score:

# Higher trust = more autonomy
trust_policies:
  A:
    auto_approve: [file_write, shell_exec]
    max_budget: 50.00
  B:
    auto_approve: [file_write]
    require_approval: [shell_exec]
    max_budget: 20.00
  C:
    require_approval: [file_write, shell_exec]
    max_budget: 5.00
  D:
    dry_run: true
    max_budget: 1.00

Secret Redaction

Nestor automatically detects and redacts secrets from agent outputs and logs. The Rust core includes 30+ patterns for:

API keys (AWS, GCP, Azure, Anthropic, OpenAI, Stripe, etc.)
OAuth tokens and JWT tokens
Database connection strings
SSH private keys
Passwords in URLs
Credit card numbers
Custom patterns via regex configuration

# Custom redaction patterns
security:
  redaction:
    enabled: true
    custom_patterns:
      - name: internal_api_key
        pattern: "MYAPP-[A-Za-z0-9]{32}"
      - name: internal_token
        pattern: "tok_[a-f0-9]{40}"

Network Security

The server component includes multiple network security layers:

Rate limiting — Configurable per-endpoint rate limits
CORS — Strict origin validation
Security headers — CSP, HSTS, X-Frame-Options, etc.
Authentication — Token-based auth for the Studio API
Input validation — Schema validation on all API inputs

Security Best Practices

Always use the Docker sandbox in production environments
Set cost budgets to prevent runaway spending
Require approval for file_write and shell_exec on new agents
Monitor trust scores and restrict low-trust agents
Use dry-run mode when testing new skills or workflows
Enable the circuit breaker for production deployments
Configure the StuckDetector to prevent infinite loops
Review agent outputs before deploying to production
Keep Nestor updated to get the latest security patches
Configure custom redaction patterns for your organization's secrets

v3.4.0 Security Additions

Release 3.4.0 (2026-04-17) closes a full security audit (5 CRITICALs and 5 HIGHs) and hardens several layers of the defense in depth model. Key additions shipped in this release:

Approval hardening — smart mode is now the default for chat and messaging bridges; dangerous shell commands are detected via regex before any execution reaches the approval checker.
CLAUDE.md write-lock — writes targeting CLAUDE.md files are always gated through explicit approval, even when security.approvalMode is set to off.
Extended sensitive paths — the protected path list now covers .credentials/, every .env variant, .ssh private keys, .aws, .kube, .docker, and .git/hooks/.
OAuth cookies hardened — the OAuth state cookie for Google and GitHub flows is now emitted with secure: req.secure and sameSite: 'lax'.
Webhook signatures — Telegram secret-token check (constant-time), Slack HMAC-SHA256 over v0:<ts>:<rawBody> with a 5-minute replay window, and Discord Ed25519 via tweetnacl.
Audit framework hardening — fail-closed authentication, regex-based whitelist for audited commands, and argument-injection prevention across every evaluator entrypoint.

Security Notice: AI agents can behave unpredictably. Never grant an agent access to production systems, financial accounts, or sensitive data without thorough testing and appropriate guardrails. Always apply the principle of least privilege.

✎ Edit this page on GitHub · Last updated 2026-04-26