What Claude Code's Sandbox Actually Does (And Doesn't Do)
A technical deep dive into Claude Code's built-in safety mechanisms, their limitations, and how runtime guardrails fill the gaps.
Claude Code is the most capable AI coding agent available today. It can navigate codebases, write complex features, run tests, and deploy applications — all from a single prompt. Anthropic built real safety mechanisms into it. But "safety mechanisms" and "sandbox" are different things, and the distinction matters when you're running agents in production.
This is a technical breakdown of what Claude Code actually does to keep you safe, where those mechanisms stop, and what fills the remaining gaps.
What Claude Code does: the permission prompt system
Claude Code's primary safety mechanism is its interactive permission system. Before executing certain actions, the agent pauses and asks for your approval. This covers three categories:
Bash commands. Every shell command Claude Code generates — npm install, git commit, mkdir, curl — triggers a permission prompt before execution. You see the exact command and choose to allow or deny it.
File writes. When Claude Code wants to create or modify a file, it shows you the proposed changes and waits for approval. You can review the diff before anything touches disk.
MCP tool calls. If Claude Code is connected to external tools via the Model Context Protocol, calls to those tools also require approval.
This system works. For interactive development sessions where you're sitting at your terminal, reviewing each action, it provides genuine oversight. Anthropic also provides allowlisting via .claude/settings.json, so you can pre-approve patterns like npm test or git status to reduce noise.
What the permission system catches
Give credit where it's due. The permission prompt will stop a wide range of dangerous actions — as long as you're paying attention:
- A
terraform destroycommand appears in the prompt. You see it. You deny it. - A
rm -rf /appears. You catch it. Crisis averted. - A rogue
DROP TABLEin a piped SQL command. You spot it in the bash preview. Denied. - An unexpected
git push --force origin main. You see the flags. You say no.
In a focused, low-volume session, this works well. The problem isn't the mechanism — it's what happens at scale.
What the permission system doesn't catch
Approval fatigue
The permission prompt fires on every action. In a real development session, that's dozens to hundreds of prompts. npm install? Approve. mkdir src/components? Approve. cat package.json? Approve.
By the 50th approval, you're not reading anymore. You're hitting y on muscle memory. This is when terraform destroy stops looking different from terraform plan. The prompt is identical in format — same font, same layout, same single-key approval. The only difference is the content you've stopped reading.
This isn't a design flaw — it's a human factors problem. No interactive approval system survives sustained high-volume use without degradation.
No risk differentiation
Claude Code treats every command with the same level of scrutiny. npm install express and terraform destroy --auto-approve both get the same permission prompt. There's no visual distinction, no escalation, no "this one is actually dangerous" signal.
An allowlist helps — you can pre-approve safe patterns — but it's binary. Commands are either pre-approved (no prompt) or not (full prompt). There's no middle tier for commands that should require elevated attention.
--dangerously-skip-permissions bypasses everything
The flag exists for a reason. Autonomous and CI/CD workflows need agents that can operate without a human approving each step. But --dangerously-skip-permissions is all-or-nothing. It doesn't skip permissions for safe commands while keeping them for dangerous ones. It skips all permissions. Every command your agent generates executes immediately, with no review and no blocking.
Anthropic is explicit about this — the flag name is a warning. But it's also the only path to fully autonomous operation, which means every production deployment, every background agent, and every CI pipeline that uses Claude Code either runs with constant manual approval or runs with no safety net at all.
No session coordination
When you run multiple Claude Code agents in parallel — which is increasingly common for large tasks — each agent operates independently. There's no shared state, no coordination, no awareness that another agent is modifying the same files. Two agents can write conflicting changes to the same file, or one agent can delete a resource another agent depends on. The permission system is per-session with no cross-session visibility.
No rollback mechanism
If you approve a destructive command — whether by mistake or because you didn't read it — there's no undo. Claude Code doesn't snapshot files before modifying them. It doesn't maintain a history of changes that can be reversed. Your options are git stash, git checkout, or restoring from backup. If the destructive action was against infrastructure or a database, even git won't save you.
No audit trail
After a session ends, there's no structured record of what happened. You can scroll through your terminal history, but there's no queryable log of which commands executed, which files changed, or which approvals were granted. For regulated environments or incident investigation, this is a gap.
The sandboxing gap
Here's the core distinction: Claude Code is not a sandbox. It doesn't restrict what the agent can access — it only asks permission before the agent acts.
A true sandbox constrains the execution environment itself. It limits filesystem access to specific directories, restricts network connections, and controls which system calls are available. Claude Code does none of this. When a command executes — whether through approval or --dangerously-skip-permissions — it runs with your full user permissions. It can read any file you can read, access any network endpoint you can reach, and execute any binary on your PATH.
This is by design. Claude Code needs access to your real project files, your real tools, and your real development environment to be useful. A fully sandboxed agent that can't access your filesystem or network would be severely limited. But it means the safety model is entirely dependent on the permission prompt — and on you reading it correctly every time.
What a runtime guardrail adds
Railroad operates at a different layer. Instead of asking the LLM to decide what's safe, or asking you to review every action, it applies deterministic rules to every agent action before execution.
Pattern matching, not intent guessing
Railroad doesn't interpret what a command is trying to do. It matches against explicit patterns. If terraform destroy is in your blocklist, any command containing that string is blocked — regardless of context, regardless of how the LLM frames it, regardless of whether you're paying attention. This is a deterministic check, not an LLM judgment call. It runs in under 2ms.
Configurable policy via railroad.yaml
Your safety policy lives in a declarative YAML file, version-controlled alongside your code:
blocklist:
- terraform destroy
- "rm -rf"
- "DROP TABLE"
- "push --force"
- drizzle-kit push --force
approve:
- npm publish
- docker push
- terraform apply
allowlist:
- npm install
- npm test
- git status
- git diff
Three tiers, each with clear semantics. Blocked commands never execute. Approved commands pause for explicit sign-off. Allowlisted commands pass through instantly. Everything else follows your default policy.
OS-level sandboxing
Railroad applies actual sandboxing at the operating system level — sandbox-exec on macOS, bwrap on Linux. This restricts filesystem access, network connections, and system calls for the agent process itself. Your agent can work within your project directory but can't read /etc/shadow or exfiltrate data to an external endpoint.
File snapshots and rollback
Every file write is snapshotted before modification. If anything goes wrong — a bad edit, a corrupted config, an unintended deletion — railroad rollback restores the previous state instantly. This works at the file level, not the git level, so it catches changes that were never committed.
Session coordination
When multiple agents run in parallel, Railroad maintains shared state across sessions. Agents are aware of each other's modifications. Conflicting writes are flagged before they execute, not discovered after the fact.
Structured observability
Every action is logged as structured JSON — command executed, policy decision, timestamp, session ID, file diffs. You get a complete, queryable audit trail of everything every agent did. For incident investigation, compliance, or simply understanding what happened during a long autonomous run, the data is there.
When to use what
Claude Code's built-in permissions are well-designed for what they are: interactive development sessions where a human is actively supervising. If you're pair-programming with Claude Code, reviewing each change, and working on a single task — the permission system gives you solid oversight.
Railroad is for everything else:
- Autonomous runs with
--dangerously-skip-permissionswhere you need safety without manual approval - CI/CD pipelines where no human is present to review prompts
- Production-adjacent work where a single bad command has real consequences
- Parallel agent workflows where session coordination matters
- Regulated environments where audit trails are required
They're complementary. Claude Code provides the AI capabilities. Railroad provides the runtime safety layer that lets those capabilities operate at full speed without risk.
Getting started
cargo install --git https://github.com/railroad-dev/railroad.git
railroad install
Railroad is open-source, MIT-licensed, written in Rust, and runs entirely on-device. No data leaves your machine. No cloud dependency. Just deterministic guardrails between your agent and the commands it generates.