15. Jail Claude’s egress in a per-process netns with a routing allowlist#
Date: 2026-06-17
Status#
Accepted
Layers on top of 5. Leave network egress open; egress filtering is out of scope (ADR 5) — same
mechanism-beneath-bwrap relationship — but as of 2026-06-18 the jail is the
default. That overrides ADR 5’s open-egress default (not its reasoning:
CLAUDE_SANDBOX_EGRESS_JAIL=0 restores the open path, and filtering still lives
around the tool, not inside it). ADR 5 carries a pointer here.
Context#
ADR 5 left network egress open and said so deliberately: filtering “belongs at the devcontainer boundary,” and a future “add network sandboxing” would be “a layered addition on top of credential isolation — record it as its own ADR if adopted; it does not reverse this decision.” This is that ADR.
The threat is lateral movement, not exfil (issue #31, folded into #56). bwrap
already hides credentials and the code is OSS; the asset worth protecting is
network reach. A compromised or prompt-injected session sharing the host
network namespace can probe RFC1918, hit internal HTTP and 169.254.169.254,
and — the incident that motivates this — reach lab devices with default creds
(EPICS IOCs, PMAC). A PMAC reached by a hostile session is a safety incident,
not just an information one.
Two user cohorts need different controls:
Cohort A — HTTPS to named hosts. Claude Code’s native sandbox (
allowedDomains/ an SNI proxy) fits this; that is issue #33’s scope.Cohort B (this repo’s users) — lab devices addressed by bare RFC1918 IP, over UDP, on dynamic ports (EPICS CA/PVA, PMAC). A hostname allowlist cannot express this. Cohort B needs IP/CIDR-level control.
The runtime target is rootless Podman (pasta is the Podman 5+ default
outbound path). Consequences that rule out Docker-shaped designs: no host
DOCKER-USER/iptables knob for the unprivileged user, userspace outbound only,
and no CAP_NET_ADMIN on the container. So the control cannot be a host firewall
or a container capability — it has to be built from primitives an unprivileged
user already has.
Feasibility was probed unjailed on a real rootless host (the sandboxed
session reports CapBnd=0, a false “impossible” reading — see
12. Treat the read-write workspace as untrusted: default to $PWD, source config from /etc): unprivileged netns create + in-netns routing
with no caps passes, pasta builds a tap given /dev/net/tun passes, and
live egress through pasta — internet reachable, non-allowlisted RFC1918
blackholed — passes.
Decision#
Add an egress jail beneath the bwrap wall, scoped to Claude alone. The
container keeps --network=host, so ordinary (non-Claude) shells and EPICS
Channel Access broadcast are untouched. Only the shadow’s launch is jailed:
claude-shadowcreates a user + network namespace withunshare -rn(a short-lived holder) and bwrap inherits it — bwrap keeps omitting--unshare-net; it only nests its own userns inside the holder’s. (The holder must create the netns, not bwrap and not pasta: the container has noCAP_NET_ADMINto make a netns without a userns, and if pasta creates the namespaces it also makes a pid+mount ns it can’t give bwrap a usable/procfor — bwrap then aborts on/proc/<pid>lookups. A user+net-only holder keeps/procvalid so bwrap nests cleanly.)pastaattaches from outside the holder by PID (pasta --config-net <holder-pid>; it backgrounds itself). The egress proxy must run outside the netns — it needs host connectivity to proxy. No container caps, no host-firewall change. Note--config-netmirrors the host’s L3 config into the netns: the host address, the connected-subnet route, the default gateway, and the DNS resolvers from/etc/resolv.conf.Routing-as-allowlist (surgical) inside the holder’s netns. A blanket RFC1918 blackhole is wrong: on sites where the resolvers and gateway are themselves RFC1918 (e.g. an all-
172.23/16lab network) it kills DNS, and pasta’s mirrored connected-subnet route is more specific than the blackhole, so the whole local subnet stays reachable. Instead the holder:blackhole10/8,172.16/12,192.168/16and the connected subnet,unreachable169.254/16; then punches back only — the gateway (/32, on-link), the DNS resolvers (/32via gw; resolution is not lateral movement, since connections to internal IPs stay blackholed), and theallow-ipdevices (/32via gw) from/etc/claude-sandbox.conf. The holder locks these down before handing off to bwrap — ordering (netns created → pasta attached → routes locked → then Claude runs) is load-bearing for the boundary. The blackholes are fail-closed (a failed one aborts the launch); the device/DNS punches are fail-soft (a missing one is lost reachability, not an open hole).Stub-resolver DNS via a pasta forwarder. Punching
/32s for the/etc/resolv.confresolvers only works when those resolvers are routable. On a personal Ubuntu desktop the sole resolver is a loopback stub (127.0.0.53from systemd-resolved, or Tailscale MagicDNS) that lives in the host netns and answers nothing inside the jail — so every lookup getsECONNREFUSEDand the API looks down (issue #60). The fix: pasta attaches with--dns-forward 192.0.2.53(an RFC5737 TEST-NET address — globally non-routable, outside every blackholed range), making it listen on that address inside the netns and relay DNS to the host’s real resolvers (pasta runs in the host netns, so it reaches the loopback stub). Whenclaude-shadowdetects an all-loopback/etc/resolv.conf, it binds a one-linenameserver 192.0.2.53over Claude’s/etc/resolv.confand the holder routes that/32via the gateway. Hosts with routable resolvers are unchanged (the forwarder is staged but unused). Resolution stays proxied and internal IPs stay blackholed, so the boundary is intact; if no resolver can be established at all the jail says so rather than failing silently.Security rests on userns ownership, not caplessness. Claude is not capless here: because bwrap nests its userns inside the holder’s unprivileged userns, the new userns grants Claude a full capability set (
CapBnd=…1ffffffffff). That is expected and unavoidable. What contains it: the netns and its routes are owned by the holder’s userns — an ancestor of Claude’s — so Claude’s caps confer no authority over them. Verified directly: from inside the jail, deleting a blackhole route, punching a route past it, and creating a net device all failEPERM, and RFC1918 stays blocked after the attempts.
On by default, fail-closed, with an escape hatch. The jail runs unless
CLAUDE_SANDBOX_EGRESS_JAIL=0 (env, per session) or egress-jail = 0
(/etc/claude-sandbox.conf, per host) disables it. If the jail is on but a
prerequisite is missing (/dev/net/tun, pasta, unshare), the launch fails
closed — claude refuses to start rather than silently dropping back to open
egress — and the error names both the fix and the =0 escape hatch. This is the
secure-by-default choice for the lab threat (a misconfigured host can’t quietly
lose the control); the escape hatch keeps a non-EPICS or deliberately-open host
unblocked.
Two structural choices fix scope:
Bash, inlined in the shadow. The setup is implemented as inlined functions (
netns_launch()orchestrating,netns_holder()running insideunshare -rn, plus anegress_jail_enabledpredicate) insideclaude-shadow, not a sourced module — preserving the single-file, read-top-to-bottom auditability that 8. Bash-only: no Python package, uv, or pytest and 14. Keep the integrity-check surfaces separate and self-contained rest on. netns + routing + pasta is shell orchestration of CLI tools; the literalip route add blackhole …commands are the most auditable representation of the boundary, so no higher-level language is warranted. (Trigger to revisit extraction into its own file — its own ADR — is if the net code outgrows the shadow’s readability.)Allowlist lives in
/etc, not the workspace.allow-ipentries come from/etc/claude-sandbox.conf, outside the sandbox’s rw set, per 12. Treat the read-write workspace as untrusted: default to $PWD, source config from /etc. A per-workspace allowlist would be attacker-writable from inside the jail.
Consequences#
Defence in depth, not a stronger wall. This layer sits beneath bwrap: a bwrap escape could re-plumb the netns. It raises the cost of lateral movement for a contained session; it is never stronger than the bwrap boundary above it.
Channel Access broadcast for Claude is gone. Claude’s private netns has no LAN broadcast domain, so CA auto-discovery won’t work for Claude — it must use unicast
EPICS_CA_ADDR_LIST. Normal shells keep host networking and broadcast.Requires
/dev/net/tunin the container (devcontainer.jsonrunArgs--device=/dev/net/tun); pasta and slirp4netns are both TAP-based and neither works without it. This is the one hard container-side requirement.HTTPS_PROXY-style env proxy is explicitly not the mechanism (issue #31 Option D): a hostile process unsets the env var or opens a raw socket. The control is enforced by ancestor-owned netns + kernel routing, not by environment.Integrity checks still pass in the jail — no jail-aware variant needed.
/verify-sandboxcheck 06 (andsandbox-verify.sh) assertCapEff=0(the effective set), notCapBnd. bwrap’s--cap-drop ALLempties the effective set even inside the nested userns, soCapEff=0holds and the full 18-check battery passes in a jailed session (verified live). What differs from the non-jail sandbox is only theCapBndceiling —…1ffffffffffin the jail vs0non-jail, a nested-userns artifact. Effective caps are zero, so nothing is active; route-immutability additionally holds via ancestor-userns ownership. Cap-ceiling diligence — verified 2026-06-18 (probe-network-jail-caps.sh, run unjailed): the higherCapBndceiling cannot be re-raised to weaken another bwrap protection. Even afterunshare -rUmgrants a full effective cap set in a child userns,mount -o remount,rw /, a bind-mount over a--ro-bindpath, andsethostnameallEPERM— bwrap’s locked mounts are immutable from any descendant userns. The fullCapBndis therefore inert.Hostname allowlists stay out of scope for Cohort B. Native
allowedDomainscannot express bare-IP/UDP/dynamic-port device traffic; Cohort A / dual-sandbox remains issue #33.On by default shifts the dogfood ≈ guest cost. With the jail the default and fail-closed, a host that hasn’t mounted
/dev/net/tun(adevcontainer.jsonrunArg an installer can’t add) gets aclaudethat refuses to launch until it either adds the device or setsCLAUDE_SANDBOX_EGRESS_JAIL=0.install.shinstalls pasta so that prerequisite is never the blocker; the dogfood box mounts the tun device. A plaingit clone + ./installguest that wants the default jail must add the one runArg — the error says so — and otherwise opts out with=0. The deliberate trade: a loud stop over a silent downgrade of the default control.Verification will follow the same three-surface model as 14. Keep the integrity-check surfaces separate and self-contained: a FUTURE, optional jail-aware check (not yet implemented) would assert, when the jail is enabled, that the netns exists and the RFC1918 blackhole holds with only the configured
allow-iproutes punched through. This is a future item — the existing 18-check battery already passes unchanged in a jailed session (check 06 assertsCapEff=0; see the bullet above), so no jail-aware variant is required today.Proven before adoption: the full Design-D chain — holder netns → pasta attach → route lockdown → nested capful bwrap → Claude — works on a real rootless host, and the route-immutability security battery passes (
probe-network-jail.sh, run unjailed).Done since adoption: implemented in
claude-shadow(netns_launch/netns_holder/egress_jail_enabled) plusinstall.sh(installspasst, which provides pasta) and tests; on by default, fail-closed, and validated end-to-end on a rootless--network=hosthost and in a bridge/NAT container — the gateway-collision and nested-pasta paths are proven (the gateway is pinned on-link before the RFC1918 blackhole, so egress works while RFC1918 and the same subnet stay blocked). The shadow/install.sh/tests implementation landed on adoption (2026-06-18).Still optional/open: only the jail-aware
/verify-sandboxcheck (a future item, not yet implemented — see the verification bullet above). Core internet/RFC1918 behaviour is already proven.
Live design and the feasibility probes: issue #56 (refines #31, which is closed; #33 remains open for Cohort A).