Threat model#
claude-sandbox exists to answer one question: what can go wrong when a
developer runs Claude Code inside a devcontainer, and which of those failures
is this tool responsible for preventing? This page explains the reasoning
behind the boundary. For the hard, look-it-up tables — the exact defences and
the exact exposures — see locked-down defences
and deliberately exposed. For how the
bwrap primitives fit together to enforce all this, see
architecture.
Who and what we defend against#
The adversary is not the developer. It is an LLM-driven attack riding on input the developer did not write: a hostile prompt, a hostile file Claude is asked to read, or a hostile tool result. Any of these can attempt to steer Claude’s tools toward four goals:
Credential exfiltration — reading host secrets (env vars, dotfiles, token stores, IPC sockets) and shipping them out.
Driving the host IDE — reaching across the IPC bridges and runtime sockets that connect the devcontainer to the editor and the host desktop.
Privilege escalation inside the devcontainer — setuid tricks, capability abuse, ptrace/kill against neighbouring processes, terminal injection.
Lateral network movement — using the host’s network reach to pivot to internal RFC1918 hosts, the
169.254.169.254metadata endpoint, or lab devices reachable by bare IP with default credentials (EPICS IOCs, PMAC motion controllers — where a hostile session reaching a PMAC is a safety incident, not just an information one). This is the threat the default egress jail closes.
Each defence in the locked-down table
maps one-to-one onto an observed path for one of those goals and the bwrap
primitive that closes it. The design does not chase hypothetical attacks; it
closes the concrete exfiltration routes — environment variables, dotfiles, IPC
and runtime sockets, X11, TIOCSTI terminal injection, setuid escalation —
that an attacker-controlled Claude can actually reach.
A consequence of taking the developer off the threat list: enforcement
targets accidental exposure, not a determined human deliberately dismantling
their own sandbox. The integrity guard is built to survive Claude Code’s own
self-updates and stray ~/.claude/settings.json edits, not to win a fight
against the box’s owner with root.
Why each exposure is in or out of scope#
The split between in-scope and out-of-scope is not arbitrary; it follows from what this tool is: a credential-isolation tool, not a general sandbox against arbitrary native code.
A defence is in scope when it closes a credential or host-control path that
the sandbox can plug without breaking Claude’s job. Those are exactly the rows
of the lockdown table — env scrubbing, the strict-under-/root inversion,
dropped capabilities, the PID/IPC/UTS namespaces, the masked IPC and runtime
dirs. They cost nothing Claude needs.
A defence is out of scope when plugging it would either break Claude or exceed what a credential-isolation tool can honestly promise. The out-of-scope table records each, but the rationale matters:
Workspace contents are out of scope because Claude has to read your workspace to be useful. This is the one irreducible exposure — see the caveat below.
The container host kernel is out of scope because a bwrap-aware kernel exploit is a different class of problem. This tool isolates credentials; it does not claim to contain arbitrary native code. The devcontainer host is the trust boundary, and keeping the kernel patched is the operator’s job.
Lateral network movement to internal hosts is now in scope and addressed by default: the egress jail (15. Jail Claude’s egress in a per-process netns with a routing allowlist, on by default) runs Claude in a per-process network namespace that blackholes RFC1918, so a compromised session cannot pivot to internal LANs or lab devices. What remains out of scope is internet-domain / exfil filtering — restricting which outbound domains Claude reaches, or stopping a session POSTing data to a permitted destination. A hostname allowlist is Claude Code’s native sandbox’s job (
allowedDomains); a DLP boundary belongs at the devcontainer edge. See the egress jail and the native sandbox below.Non-standard credential paths are out of scope as a guarantee because the installer can warn about odd mounts it sees at install time but cannot enumerate every custom bind. Auditing your devcontainer’s
mountsblock is yours.
The throughline: the sandbox promises credential isolation and lateral network isolation against an LLM-driven attacker. Where a promise would still be dishonest — kernel exploits, internet-domain / exfil control, mounts the installer never saw — it is named as out of scope rather than implied.
What is deliberately exposed, and why#
A handful of paths are reachable from inside Claude on purpose, because locking them down would defeat the tool. The full list with modes lives in deliberately exposed; the rationale is consistent across them:
The workspace is read-write because editing your project is the entire point.
The token stores (
gh,glab-cli) are bound read-write so Claude can push code — the single largest deliberate exposure, addressed under PAT hygiene.Claude’s own state (
~/.claude,~/.claude.json, caches) is bound so settings, skills, and the OAuth token survive across launches instead of being swallowed by the strict-under-/roottmpfs.The curated gitconfig and the host system gitconfig are exposed read-only, the latter neutralised for
gititself viaGIT_CONFIG_SYSTEM=/dev/null.Internet egress (
api.anthropic.com, the forges, package registries) is reachable because Claude needs it — but only the internet, DNS, and explicitlyallow-ip-listed devices: the egress jail (below) blackholes the internal RFC1918 network by default, so this exposure does not extend to lateral movement onto internal hosts.
The governing principle is minimum necessary exposure with a forward-compatible
default: ~/.config/ keeps a strict two-entry allowlist because credentials
live there by XDG contract, so a new credentialed tool is masked for free;
~/.local/share/ and ~/.cache/ are bulk-bound because they hold plugin trees
and caches, not secrets. That trade-off and its failure mode (a tool that
mis-files credentials under ~/.local/share/) are detailed in
deliberately exposed.
The irreducible workspace-visibility caveat#
This is the limitation worth stating plainly, because no amount of bwrap fixes it: workspace contents are visible to Claude, and that is by design. The sandbox protects you against host-credential leaks through env vars, dotfiles, and IPC sockets. It does not, and cannot, hide what you have checked out into the workspace — Claude has to read it to do its job, so anything in the workspace is reachable from Claude’s tools.
The practical consequence is a rule, not a feature:
Keep secrets outside the workspace.
Mount them via your devcontainer’s mounts (for example into ~/.config/, which
sits behind the strict allowlist) rather than dropping a .env file full of
production credentials at the workspace root and expecting it to be invisible.
It will not be. The sandbox draws the credential boundary at the workspace
edge; what you place inside that edge, you are choosing to expose.
PAT hygiene: the soft underbelly#
The deliberate read-write bind of the gh and glab token stores is the
sandbox’s softest point, and it is worth being explicit about why. Everything
else the lockdown closes — env vars, dotfiles, sockets — is taken away from
Claude. The forge tokens are handed to Claude, because pushing code requires
them. A compromised session can therefore use those tokens to push to any
repository the PAT covers, modify CI workflows, or reach other repos in the
same organisation. No bwrap primitive can distinguish a legitimate git push
from a malicious one; they use the same token.
Because the sandbox cannot shrink that blast radius, the token must. The mitigation is scope discipline at the source, and the reasoning is blast-radius arithmetic:
A fine-grained, single-repo token means a compromise reaches exactly the repo you are working on — not the org.
A short expiry (7–30 days) means a leaked token dies on its own; re-auth costs seconds.
Omitting
workflowscope (unless Claude must edit GitHub Actions) keeps a compromise away from your CI; noadmin:*or org-wide write keeps it off everything else.GitLab gets the equivalent: project-scoped tokens,
apionly if you need push, otherwiseread_repository+write_repository.
The just gh-auth / just glab-auth helpers keep the token out of shell
history, but they do not enforce scope — that part is irreducibly yours.
When a session genuinely does not need to push, the right move is to remove the
exposure entirely rather than rely on a tight token: CLAUDE_SANDBOX_NO_FORGE=1
in remoteEnv skips the gh/glab binds and strips the credential helpers
from the generated gitconfig, so git push fails by design. The exact
mechanism and how to set it are in
deliberately exposed.
The egress jail and the native sandbox#
Credential isolation answers what can a compromised session read? The egress
jail answers a second question — what can it reach? — and as of 2026-06-18 it
is on by default. The threat is lateral movement, not exfiltration: bwrap
already hides the credentials, so the asset worth protecting is network reach.
Without the jail, a prompt-injected session that shares the host network
namespace can probe RFC1918, hit 169.254.169.254, and — the incident that
motivates this — reach lab devices with default credentials (EPICS IOCs, PMAC). Reaching a PMAC is a
safety incident, not merely an information one.
The jail (15. Jail Claude’s egress in a per-process netns with a routing allowlist) runs only Claude in a per-process
network namespace beneath bwrap. A routing allowlist blackholes 10/8,
172.16/12, 192.168/16, the connected subnet, and link-local 169.254/16,
leaving the internet, DNS, and the device IPs you list as allow-ip reachable —
so Claude still works while a compromised session has nowhere internal to pivot.
It is fail-closed: if /dev/net/tun, pasta, or unshare is unavailable,
claude refuses to launch rather than silently fall back to open egress. The
escape hatch CLAUDE_SANDBOX_EGRESS_JAIL=0 (env, or egress-jail = 0 in
/etc/claude-sandbox.conf) restores the older shared-host-netns world of
5. Leave network egress open; egress filtering is out of scope. Normal, non-Claude shells keep host networking
untouched. The operational recipe — adding the required --device=/dev/net/tun,
allow-listing a device, or turning the jail off — is in
Configure the network egress jail; the config
keys are in configuration.
Meshing with Claude Code’s native sandbox#
This tool is not an alternative to Claude Code’s own sandbox — they are composable layers covering different surfaces, and the strongest posture runs both.
claude-sandbox(this repo) provides two things: credential protection (the bwrap bind model — env scrubbing, the strict-under-$HOMEinversion, masked IPC/runtime sockets) and sideways / lateral network isolation (the egress jail blackholes RFC1918 so a compromised session cannot pivot to internal hosts or lab devices). It does not restrict which internet domains Claude reaches.Claude Code’s native sandbox provides the complementary surface — internet domain isolation (an
allowedDomainsallowlist enforced by an SNI proxy), restricting outbound HTTPS to named hosts. It does not express bare-IP / UDP / dynamic-port lab-device traffic, and it does not provide this repo’s credential-bind model.
The two mesh: credential isolation + lateral isolation (this tool) + internet
domain allowlisting (native) is defence in depth across complementary surfaces —
three layers, not three alternatives. The cohort framing in
15. Jail Claude’s egress in a per-process netns with a routing allowlist makes the division concrete. Cohort A needs
HTTPS to named hosts; Claude Code’s native allowedDomains fits it (the
dual-sandbox work is issue #33, still open). Cohort B — this repo’s users —
reach lab devices by bare RFC1918 IP, over UDP, on dynamic ports (EPICS
Channel Access / pvAccess, PMAC); a hostname allowlist cannot express that,
which is exactly why this tool’s IP/CIDR egress jail exists.