The open-source runtime for AI agents

You write the agent. Polos handles sandboxes, durability, approvals, triggers, and observability.

$ npx create-polos my-project

Building agents is easy. Running them in production is hard.

ChallengeTypical Agent FrameworkWith Polos
SandboxingNone - DIY or run unsandboxedDocker, E2B + built-in tools
DurabilityAgent crashes, start overAuto-retry, resume from exact step
ApprovalsBuild it yourselfSlack, UI, terminal - one tap
TriggersGlue code for every webhookBuilt-in: HTTP, webhooks, cron, events
ObservabilityGrep through logsFull tracing, every tool call
CostRe-run failed LLM calls from scratchPrompt caching, 60–80% savings

You write the agent. Polos handles the rest.

What you get with Polos

Sandboxed Execution

Isolated Docker & E2B environments. Built-in tools: exec, read, write, edit, glob, grep, web_search.

Human-in-the-Loop

Approval flows for any tool call. Reach your team via Slack. Paused agents consume zero compute.

Durable Workflows

60–80% cost savings via prompt caching. Auto-retry, log-replay, concurrency control.

Triggers

Webhook URLs, HTTP API, cron schedules, event-driven. GitHub/Slack integration with no glue code.

Observability

OpenTelemetry tracing, full execution history, visual dashboard for debugging and replay.

Bring Your Stack

Any LLM via Vercel AI SDK/LiteLLM. CrewAI/LangGraph/Mastra compatible. Python or TypeScript.

See it in action.

Get started in seconds

Terminal
$ npx create-polos my-project
agent.py
from polos import define_agent, sandbox_tools
from polos.models import anthropic

sandbox = sandbox_tools(env="docker")

agent = define_agent(
    id="coding_agent",
    model=anthropic("claude-sonnet-4-5"),
    tools=[*sandbox],
)
agent.ts
import { defineAgent, sandboxTools } from "polos";
import { anthropic } from "polos/models";

const sandbox = sandboxTools({ env: "docker" });

const agent = defineAgent({
  id: "coding_agent",
  model: anthropic("claude-sonnet-4-5"),
  tools: [...sandbox],
});

Build real world agents

PR Reviewer

Triggered by GitHub webhooks. Clones the repo, checks out the branch, runs tests in a sandbox. Posts a line-by-line review with suggested fixes. Waits for the author to respond before following up. Durable execution means it never double-comments, even if it crashes mid-review.

Data Analyst

Connects to your data warehouse, writes and executes SQL in a sandboxed environment. Builds charts, spots anomalies, drafts a summary. Sends you an approval page before sharing with stakeholders - so nothing goes out without your sign-off.

Research Agent

Crawls dozens of sources, extracts key findings, and builds a structured knowledge base. Checkpoints after every source - so if it hits a rate limit or crashes at source 47, it picks up right where it left off. Pings you on Slack or Discord when the report is ready for review.

Ship agents to production.