The runtime for agents that do real work

Sandboxed execution. Agents that reach you. Durable workflows.

AI agents break the rules of traditional software.

zz! ?

Async by Nature

AI agents run while you sleep. But they still need you. To approve. To confirm. To provide a credential. You're not at your desk. Your agent is stuck.

vsAI</>$ _

Autonomous by Design

Traditional SaaS is predictable. Click a button, one thing happens. AI agents are different. You say "fix the bug" - they write code, run commands, delete files. That power is the point. And the risk.

Most frameworks ignore this. Polos is built for it.

What you get with Polos

config.ts
// Create a sandboxed environment — agents get exec, read, write,
// edit, glob, and grep tools automatically.
const sandbox = sandboxTools({
  env: 'docker',
  docker: {
    image: 'node:20-slim',
    workspaceDir: './workspace',
    memory: '2g',
  },
});

// Give the agent sandbox tools — it can now run commands,
// read/write files, and explore the codebase autonomously.
const codingAgent = defineAgent({
  id: 'coding_agent',
  model: anthropic('claude-sonnet-4-5'),
  systemPrompt: 'You are a coding assistant. The repo is at /workspace.',
  tools: [...sandbox], // exec, read, write, edit, glob, grep
})

Secure Sandbox

Agents run in isolated environments - Docker, E2B, or cloud VMs.

Built-in tools for shell, file system, and web search.

Full power. Zero risk to your systems.

Agents That Reach You

Agents reach you - not the other way around.

Stripe-like approval pages that collect input, not just yes/no.

Slack, SMS, email. You're at dinner. Phone buzzes. One tap. Done.

Notification channels - Slack, SMS, Email

Durable Execution

Built-in observability - every step, every approval, every tool call.

60–80% cost savings via prompt caching.

Automatic retries on failure - no manual intervention.

State persists - agents resume exactly where they left off.

Concurrency control across multiple agents - no API rate limit chaos.

See it in action.

Build real world agents

PR Reviewer

Hooks into GitHub. Clones the repo, checks out the branch, runs tests in a sandbox. Posts a line-by-line review with suggested fixes. Waits for the author to respond before following up. Durable execution means it never double-comments, even if it crashes mid-review.

Data Analyst

Connects to your data warehouse, writes and executes SQL in a sandboxed environment. Builds charts, spots anomalies, drafts a summary. Sends you an approval page before sharing with stakeholders - so nothing goes out without your sign-off.

Research Agent

Crawls dozens of sources, extracts key findings, and builds a structured knowledge base. Checkpoints after every source - so if it hits a rate limit or crashes at source 47, it picks up right where it left off. Pings you on Slack when the report is ready for review.

Build something real.