You write the agent. Polos handles sandboxes, durability, approvals, triggers, and observability.
Building agents is easy. Running them in production is hard.
| Challenge | Typical Agent Framework | With Polos |
|---|---|---|
| Sandboxing | None - DIY or run unsandboxed | Docker, E2B + built-in tools |
| Durability | Agent crashes, start over | Auto-retry, resume from exact step |
| Approvals | Build it yourself | Slack, UI, terminal - one tap |
| Triggers | Glue code for every webhook | Built-in: HTTP, webhooks, cron, events |
| Observability | Grep through logs | Full tracing, every tool call |
| Cost | Re-run failed LLM calls from scratch | Prompt caching, 60–80% savings |
You write the agent. Polos handles the rest.
What you get with Polos
Isolated Docker & E2B environments. Built-in tools: exec, read, write, edit, glob, grep, web_search.
Approval flows for any tool call. Reach your team via Slack. Paused agents consume zero compute.
60–80% cost savings via prompt caching. Auto-retry, log-replay, concurrency control.
Webhook URLs, HTTP API, cron schedules, event-driven. GitHub/Slack integration with no glue code.
OpenTelemetry tracing, full execution history, visual dashboard for debugging and replay.
Any LLM via Vercel AI SDK/LiteLLM. CrewAI/LangGraph/Mastra compatible. Python or TypeScript.
See it in action.
Get started in seconds
$ npx create-polos my-projectfrom polos import define_agent, sandbox_tools
from polos.models import anthropic
sandbox = sandbox_tools(env="docker")
agent = define_agent(
id="coding_agent",
model=anthropic("claude-sonnet-4-5"),
tools=[*sandbox],
)import { defineAgent, sandboxTools } from "polos";
import { anthropic } from "polos/models";
const sandbox = sandboxTools({ env: "docker" });
const agent = defineAgent({
id: "coding_agent",
model: anthropic("claude-sonnet-4-5"),
tools: [...sandbox],
});Build real world agents
Triggered by GitHub webhooks. Clones the repo, checks out the branch, runs tests in a sandbox. Posts a line-by-line review with suggested fixes. Waits for the author to respond before following up. Durable execution means it never double-comments, even if it crashes mid-review.
Connects to your data warehouse, writes and executes SQL in a sandboxed environment. Builds charts, spots anomalies, drafts a summary. Sends you an approval page before sharing with stakeholders - so nothing goes out without your sign-off.
Crawls dozens of sources, extracts key findings, and builds a structured knowledge base. Checkpoints after every source - so if it hits a rate limit or crashes at source 47, it picks up right where it left off. Pings you on Slack or Discord when the report is ready for review.