Threat model#

claude-sandbox exists to answer one question: what can go wrong when a developer runs Claude Code inside a devcontainer, and which of those failures is this tool responsible for preventing? This page explains the reasoning behind the boundary. For the hard, look-it-up tables — the exact defences and the exact exposures — see locked-down defences and deliberately exposed. For how the bwrap primitives fit together to enforce all this, see architecture.

Who and what we defend against#

The adversary is not the developer. It is an LLM-driven attack riding on input the developer did not write: a hostile prompt, a hostile file Claude is asked to read, or a hostile tool result. Any of these can attempt to steer Claude’s tools toward four goals:

Credential exfiltration — reading host secrets (env vars, dotfiles, token stores, IPC sockets) and shipping them out.
Driving the host IDE — reaching across the IPC bridges and runtime sockets that connect the devcontainer to the editor and the host desktop.
Privilege escalation inside the devcontainer — setuid tricks, capability abuse, ptrace/kill against neighbouring processes, terminal injection.
Lateral network movement — using the host’s network reach to pivot to internal RFC1918 hosts, the 169.254.169.254 metadata endpoint, or lab devices reachable by bare IP with default credentials (EPICS IOCs, PMAC motion controllers — where a hostile session reaching a PMAC is a safety incident, not just an information one). This is the threat the default egress jail closes.

Each defence in the locked-down table maps one-to-one onto an observed path for one of those goals and the bwrap primitive that closes it. The design does not chase hypothetical attacks; it closes the concrete exfiltration routes — environment variables, dotfiles, IPC and runtime sockets, X11, TIOCSTI terminal injection, setuid escalation — that an attacker-controlled Claude can actually reach.

A consequence of taking the developer off the threat list: enforcement targets accidental exposure, not a determined human deliberately dismantling their own sandbox. The integrity guard is built to survive Claude Code’s own self-updates and stray ~/.claude/settings.json edits, not to win a fight against the box’s owner with root.

Why each exposure is in or out of scope#

The split between in-scope and out-of-scope is not arbitrary; it follows from what this tool is: a credential-isolation tool, not a general sandbox against arbitrary native code.

A defence is in scope when it closes a credential or host-control path that the sandbox can plug without breaking Claude’s job. Those are exactly the rows of the lockdown table — env scrubbing, the strict-under-/root inversion, dropped capabilities, the PID/IPC/UTS namespaces, the masked IPC and runtime dirs. They cost nothing Claude needs.

A defence is out of scope when plugging it would either break Claude or exceed what a credential-isolation tool can honestly promise. The out-of-scope table records each, but the rationale matters:

Workspace contents are out of scope because Claude has to read your workspace to be useful. This is the one irreducible exposure — see the caveat below.
The container host kernel is out of scope because a bwrap-aware kernel exploit is a different class of problem. This tool isolates credentials; it does not claim to contain arbitrary native code. The devcontainer host is the trust boundary, and keeping the kernel patched is the operator’s job.
Lateral network movement to internal hosts is now in scope and addressed by default: the egress jail (15. Jail Claude’s egress in a per-process netns with a routing allowlist, on by default) runs Claude in a per-process network namespace that blackholes RFC1918, so a compromised session cannot pivot to internal LANs or lab devices. What remains out of scope is internet-domain / exfil filtering — restricting which outbound domains Claude reaches, or stopping a session POSTing data to a permitted destination. A hostname allowlist is Claude Code’s native sandbox’s job (allowedDomains); a DLP boundary belongs at the devcontainer edge. See the egress jail and the native sandbox below.
Non-standard credential paths are out of scope as a guarantee because the installer can warn about odd mounts it sees at install time but cannot enumerate every custom bind. Auditing your devcontainer’s mounts block is yours.

The throughline: the sandbox promises credential isolation and lateral network isolation against an LLM-driven attacker. Where a promise would still be dishonest — kernel exploits, internet-domain / exfil control, mounts the installer never saw — it is named as out of scope rather than implied.

What is deliberately exposed, and why#

A handful of paths are reachable from inside Claude on purpose, because locking them down would defeat the tool. The full list with modes lives in deliberately exposed; the rationale is consistent across them:

The workspace is read-write because editing your project is the entire point.
The token stores (gh, glab-cli) are bound read-write so Claude can push code — the single largest deliberate exposure, addressed under PAT hygiene.
Claude’s own state (~/.claude, ~/.claude.json, caches) is bound so settings, skills, and the OAuth token survive across launches instead of being swallowed by the strict-under-/root tmpfs.
The curated gitconfig and the host system gitconfig are exposed read-only, the latter neutralised for git itself via GIT_CONFIG_SYSTEM=/dev/null.
Internet egress (api.anthropic.com, the forges, package registries) is reachable because Claude needs it — but only the internet, DNS, and explicitly allow-ip-listed devices: the egress jail (below) blackholes the internal RFC1918 network by default, so this exposure does not extend to lateral movement onto internal hosts.

The governing principle is minimum necessary exposure with a forward-compatible default: ~/.config/ keeps a strict two-entry allowlist because credentials live there by XDG contract, so a new credentialed tool is masked for free; ~/.local/share/ and ~/.cache/ are bulk-bound because they hold plugin trees and caches, not secrets. That trade-off and its failure mode (a tool that mis-files credentials under ~/.local/share/) are detailed in deliberately exposed.

The irreducible workspace-visibility caveat#

This is the limitation worth stating plainly, because no amount of bwrap fixes it: workspace contents are visible to Claude, and that is by design. The sandbox protects you against host-credential leaks through env vars, dotfiles, and IPC sockets. It does not, and cannot, hide what you have checked out into the workspace — Claude has to read it to do its job, so anything in the workspace is reachable from Claude’s tools.

The practical consequence is a rule, not a feature:

Keep secrets outside the workspace.

Mount them via your devcontainer’s mounts (for example into ~/.config/, which sits behind the strict allowlist) rather than dropping a .env file full of production credentials at the workspace root and expecting it to be invisible. It will not be. The sandbox draws the credential boundary at the workspace edge; what you place inside that edge, you are choosing to expose.

PAT hygiene: the soft underbelly#

The deliberate read-write bind of the gh and glab token stores is the sandbox’s softest point, and it is worth being explicit about why. Everything else the lockdown closes — env vars, dotfiles, sockets — is taken away from Claude. The forge tokens are handed to Claude, because pushing code requires them. A compromised session can therefore use those tokens to push to any repository the PAT covers, modify CI workflows, or reach other repos in the same organisation. No bwrap primitive can distinguish a legitimate git push from a malicious one; they use the same token.

Because the sandbox cannot shrink that blast radius, the token must. The mitigation is scope discipline at the source, and the reasoning is blast-radius arithmetic:

A fine-grained, single-repo token means a compromise reaches exactly the repo you are working on — not the org.
A short expiry (7–30 days) means a leaked token dies on its own; re-auth costs seconds.
Omitting workflow scope (unless Claude must edit GitHub Actions) keeps a compromise away from your CI; no admin:* or org-wide write keeps it off everything else.
GitLab gets the equivalent: project-scoped tokens, api only if you need push, otherwise read_repository + write_repository.

The just gh-auth / just glab-auth helpers keep the token out of shell history, but they do not enforce scope — that part is irreducibly yours.

When a session genuinely does not need to push, the right move is to remove the exposure entirely rather than rely on a tight token: CLAUDE_SANDBOX_NO_FORGE=1 in remoteEnv skips the gh/glab binds and strips the credential helpers from the generated gitconfig, so git push fails by design. The exact mechanism and how to set it are in deliberately exposed.

The egress jail and the native sandbox#

Credential isolation answers what can a compromised session read? The egress jail answers a second question — what can it reach? — and as of 2026-06-18 it is on by default. The threat is lateral movement, not exfiltration: bwrap already hides the credentials, so the asset worth protecting is network reach. Without the jail, a prompt-injected session that shares the host network namespace can probe RFC1918, hit 169.254.169.254, and — the incident that motivates this — reach lab devices with default credentials (EPICS IOCs, PMAC). Reaching a PMAC is a safety incident, not merely an information one.

The jail (15. Jail Claude’s egress in a per-process netns with a routing allowlist) runs only Claude in a per-process network namespace beneath bwrap. The netns is IPv4-only (pasta --ipv4-only), so there is no IPv6 address family to pivot over. A routing allowlist blackholes 10/8, 172.16/12, 192.168/16, the CGNAT range 100.64/10 (Tailscale et al.), every connected subnet, and link-local 169.254/16, leaving the internet, DNS, and the device IPs you list as allow-ip reachable — so Claude still works while a compromised session has nowhere internal to pivot. It is fail-closed: if /dev/net/tun, pasta, or unshare is unavailable, claude refuses to launch rather than silently fall back to open egress. The escape hatch CLAUDE_SANDBOX_EGRESS_JAIL=0 (env, or egress-jail = 0 in /etc/claude-sandbox.conf) restores the older shared-host-netns world of 5. Leave network egress open; egress filtering is out of scope. Normal, non-Claude shells keep host networking untouched. The operational recipe — adding the required --device=/dev/net/tun, allow-listing a device, or turning the jail off — is in Configure the network egress jail; the config keys are in configuration.

Meshing with Claude Code’s native sandbox#

This tool is not an alternative to Claude Code’s own sandbox — they are composable layers covering different surfaces, and the strongest posture runs both.

claude-sandbox (this repo) provides two things: credential protection (the bwrap bind model — env scrubbing, the strict-under-$HOME inversion, masked IPC/runtime sockets) and sideways / lateral network isolation (the egress jail blackholes RFC1918 so a compromised session cannot pivot to internal hosts or lab devices). It does not restrict which internet domains Claude reaches.
Claude Code’s native sandbox provides the complementary surface — internet domain isolation (an allowedDomains allowlist enforced by an SNI proxy), restricting outbound HTTPS to named hosts. It does not express bare-IP / UDP / dynamic-port lab-device traffic, and it does not provide this repo’s credential-bind model.

The two mesh: credential isolation + lateral isolation (this tool) + internet domain allowlisting (native) is defence in depth across complementary surfaces — three layers, not three alternatives. The cohort framing in 15. Jail Claude’s egress in a per-process netns with a routing allowlist makes the division concrete. Cohort A needs HTTPS to named hosts; Claude Code’s native allowedDomains fits it (the dual-sandbox work is issue #33, still open). Cohort B — this repo’s users — reach lab devices by bare RFC1918 IP, over UDP, on dynamic ports (EPICS Channel Access / pvAccess, PMAC); a hostname allowlist cannot express that, which is exactly why this tool’s IP/CIDR egress jail exists.