(adr-network-egress-jail)= # 15. Jail Claude's egress in a per-process netns with a routing allowlist Date: 2026-06-17 ## Status Accepted Layers on top of {ref}`adr-network-egress-open` (ADR 5) — same mechanism-beneath-bwrap relationship — but **as of 2026-06-18 the jail is the default**. That overrides ADR 5's *open-egress default* (not its reasoning: `CLAUDE_SANDBOX_EGRESS_JAIL=0` restores the open path, and filtering still lives around the tool, not inside it). ADR 5 carries a pointer here. ## Context ADR 5 left network egress open and said so deliberately: filtering "belongs at the devcontainer boundary," and a future "add network sandboxing" would be *"a layered addition on top of credential isolation — record it as its own ADR if adopted; it does not reverse this decision."* This is that ADR. The threat is **lateral movement, not exfil** (issue #31, folded into #56). bwrap already hides credentials and the code is OSS; the asset worth protecting is *network reach*. A compromised or prompt-injected session sharing the host network namespace can probe RFC1918, hit internal HTTP and `169.254.169.254`, and — the incident that motivates this — reach **lab devices with default creds** (EPICS IOCs, PMAC). A PMAC reached by a hostile session is a *safety* incident, not just an information one. Two user cohorts need different controls: - **Cohort A** — HTTPS to *named* hosts. Claude Code's native sandbox (`allowedDomains` / an SNI proxy) fits this; that is issue #33's scope. - **Cohort B (this repo's users)** — lab devices addressed by **bare RFC1918 IP, over UDP, on dynamic ports** (EPICS CA/PVA, PMAC). A hostname allowlist *cannot express this*. Cohort B needs IP/CIDR-level control. The runtime target is **rootless Podman** (pasta is the Podman 5+ default outbound path). Consequences that rule out Docker-shaped designs: no host `DOCKER-USER`/iptables knob for the unprivileged user, userspace outbound only, and no `CAP_NET_ADMIN` on the container. So the control cannot be a host firewall or a container capability — it has to be built from primitives an unprivileged user already has. Feasibility was probed **unjailed on a real rootless host** (the sandboxed session reports `CapBnd=0`, a false "impossible" reading — see {ref}`adr-untrusted-workspace`): unprivileged netns create + in-netns routing with no caps **passes**, `pasta` builds a tap given `/dev/net/tun` **passes**, and live egress through pasta — internet reachable, non-allowlisted RFC1918 blackholed — **passes**. ## Decision Add an egress jail **beneath the bwrap wall**, scoped to Claude alone. The container keeps `--network=host`, so ordinary (non-Claude) shells and EPICS Channel Access broadcast are untouched. Only the shadow's launch is jailed: - `claude-shadow` creates a **user + network namespace** with `unshare -rn` (a short-lived *holder*) and bwrap **inherits** it — bwrap keeps omitting `--unshare-net`; it only nests its own userns inside the holder's. (The holder must create the netns, not bwrap and not pasta: the container has no `CAP_NET_ADMIN` to make a netns without a userns, and if *pasta* creates the namespaces it also makes a pid+mount ns it can't give bwrap a usable `/proc` for — bwrap then aborts on `/proc/` lookups. A user+net-only holder keeps `/proc` valid so bwrap nests cleanly.) - `pasta` **attaches from outside** the holder by PID (`pasta --config-net `; it backgrounds itself). The egress proxy *must* run outside the netns — it needs host connectivity to proxy. No container caps, no host-firewall change. Note `--config-net` **mirrors the host's L3 config** into the netns: the host address, the **connected-subnet route**, the default gateway, and the DNS resolvers from `/etc/resolv.conf`. - **Routing-as-allowlist (surgical)** inside the holder's netns. A *blanket* RFC1918 blackhole is wrong: on sites where the resolvers and gateway are themselves RFC1918 (e.g. an all-`172.23/16` lab network) it kills DNS, and pasta's mirrored connected-subnet route is *more specific* than the blackhole, so the whole local subnet stays reachable. Instead the holder: `blackhole` `10/8`, `172.16/12`, `192.168/16` **and the connected subnet**, `unreachable` `169.254/16`; then punches back only — the **gateway** (`/32`, on-link), the **DNS resolvers** (`/32` via gw; resolution is not lateral movement, since connections to internal IPs stay blackholed), and the **`allow-ip` devices** (`/32` via gw) from `/etc/claude-sandbox.conf`. The holder locks these down **before** handing off to bwrap — ordering (netns created → pasta attached → routes locked → *then* Claude runs) is load-bearing for the boundary. The blackholes are fail-closed (a failed one aborts the launch); the device/DNS punches are fail-soft (a missing one is lost reachability, not an open hole). - **Stub-resolver DNS via a pasta forwarder.** Punching `/32`s for the `/etc/resolv.conf` resolvers only works when those resolvers are *routable*. On a personal Ubuntu desktop the sole resolver is a **loopback stub** (`127.0.0.53` from systemd-resolved, or Tailscale MagicDNS) that lives in the **host** netns and answers nothing inside the jail — so every lookup gets `ECONNREFUSED` and the API looks down (issue #60). The fix: pasta attaches with `--dns-forward 192.0.2.53` (an RFC5737 TEST-NET address — globally non-routable, outside every blackholed range), making it listen on that address *inside* the netns and relay DNS to the host's real resolvers (pasta runs in the host netns, so it reaches the loopback stub). When `claude-shadow` detects an all-loopback `/etc/resolv.conf`, it binds a one-line `nameserver 192.0.2.53` over Claude's `/etc/resolv.conf` and the holder routes that `/32` via the gateway. Hosts with routable resolvers are unchanged (the forwarder is staged but unused). Resolution stays *proxied* and internal IPs stay blackholed, so the boundary is intact; if no resolver can be established at all the jail says so rather than failing silently. - **Security rests on userns ownership, not caplessness.** Claude is *not* capless here: because bwrap nests its userns inside the holder's unprivileged userns, the new userns grants Claude a **full** capability set (`CapBnd` = `…1ffffffffff`). That is expected and unavoidable. What contains it: the netns and its routes are owned by the **holder's** userns — an *ancestor* of Claude's — so Claude's caps confer no authority over them. Verified directly: from inside the jail, deleting a blackhole route, punching a route past it, and creating a net device all fail `EPERM`, and RFC1918 stays blocked after the attempts. **On by default, fail-closed, with an escape hatch.** The jail runs unless `CLAUDE_SANDBOX_EGRESS_JAIL=0` (env, per session) or `egress-jail = 0` (`/etc/claude-sandbox.conf`, per host) disables it. If the jail is on but a prerequisite is missing (`/dev/net/tun`, pasta, unshare), the launch **fails closed** — `claude` refuses to start rather than silently dropping back to open egress — and the error names both the fix and the `=0` escape hatch. This is the secure-by-default choice for the lab threat (a misconfigured host can't quietly lose the control); the escape hatch keeps a non-EPICS or deliberately-open host unblocked. Two structural choices fix scope: - **Bash, inlined in the shadow.** The setup is implemented as inlined functions (`netns_launch()` orchestrating, `netns_holder()` running inside `unshare -rn`, plus an `egress_jail_enabled` predicate) *inside* `claude-shadow`, not a sourced module — preserving the single-file, read-top-to-bottom auditability that {ref}`adr-bash-only` and {ref}`adr-integrity-surfaces` rest on. netns + routing + pasta *is* shell orchestration of CLI tools; the literal `ip route add blackhole …` commands are the most auditable representation of the boundary, so no higher-level language is warranted. (Trigger to revisit extraction into its own file — its own ADR — is if the net code outgrows the shadow's readability.) - **Allowlist lives in `/etc`, not the workspace.** `allow-ip` entries come from `/etc/claude-sandbox.conf`, outside the sandbox's rw set, per {ref}`adr-untrusted-workspace`. A per-workspace allowlist would be attacker-writable from inside the jail. ## Consequences - **Defence in depth, not a stronger wall.** This layer sits *beneath* bwrap: a bwrap *escape* could re-plumb the netns. It raises the cost of lateral movement for a contained session; it is never stronger than the bwrap boundary above it. - **Channel Access broadcast for Claude is gone.** Claude's private netns has no LAN broadcast domain, so CA auto-discovery won't work *for Claude* — it must use unicast `EPICS_CA_ADDR_LIST`. Normal shells keep host networking and broadcast. - **Requires `/dev/net/tun` in the container** (`devcontainer.json` runArgs `--device=/dev/net/tun`); pasta and slirp4netns are both TAP-based and neither works without it. This is the one hard container-side requirement. - **`HTTPS_PROXY`-style env proxy is explicitly not the mechanism** (issue #31 Option D): a hostile process unsets the env var or opens a raw socket. The control is enforced by ancestor-owned netns + kernel routing, not by environment. - **Integrity checks still pass in the jail — no jail-aware variant needed.** `/verify-sandbox` check 06 (and `sandbox-verify.sh`) assert **`CapEff=0`** (the *effective* set), not `CapBnd`. bwrap's `--cap-drop ALL` empties the effective set even inside the nested userns, so `CapEff=0` holds and the full 18-check battery passes in a jailed session (verified live). What differs from the non-jail sandbox is only the **`CapBnd` ceiling** — `…1ffffffffff` in the jail vs `0` non-jail, a nested-userns artifact. Effective caps are zero, so nothing is active; route-immutability additionally holds via ancestor-userns ownership. Cap-ceiling diligence — **verified 2026-06-18** (`probe-network-jail-caps.sh`, run unjailed): the higher `CapBnd` ceiling cannot be *re-raised* to weaken another bwrap protection. Even after `unshare -rUm` grants a full *effective* cap set in a child userns, `mount -o remount,rw /`, a bind-mount over a `--ro-bind` path, and `sethostname` all `EPERM` — bwrap's locked mounts are immutable from any descendant userns. The full `CapBnd` is therefore inert. - **Hostname allowlists stay out of scope for Cohort B.** Native `allowedDomains` cannot express bare-IP/UDP/dynamic-port device traffic; Cohort A / dual-sandbox remains issue #33. - **On by default shifts the dogfood ≈ guest cost.** With the jail the default and fail-closed, a host that hasn't mounted `/dev/net/tun` (a `devcontainer.json` runArg an installer can't add) gets a `claude` that refuses to launch until it either adds the device or sets `CLAUDE_SANDBOX_EGRESS_JAIL=0`. `install.sh` installs pasta so that prerequisite is never the blocker; the dogfood box mounts the tun device. A plain `git clone + ./install` guest that wants the default jail must add the one runArg — the error says so — and otherwise opts out with `=0`. The deliberate trade: a loud stop over a silent downgrade of the default control. - **Verification will follow the same three-surface model** as {ref}`adr-integrity-surfaces`: a FUTURE, optional jail-aware check (not yet implemented) would assert, when the jail is enabled, that the netns exists and the RFC1918 blackhole holds with only the configured `allow-ip` routes punched through. This is a future item — the existing 18-check battery already passes unchanged in a jailed session (check 06 asserts `CapEff=0`; see the bullet above), so no jail-aware variant is required today. - **Proven before adoption:** the full Design-D chain — holder netns → pasta attach → route lockdown → nested capful bwrap → Claude — works on a real rootless host, *and* the route-immutability security battery passes (`probe-network-jail.sh`, run unjailed). - **Done since adoption:** implemented in `claude-shadow` (`netns_launch`/`netns_holder`/`egress_jail_enabled`) plus `install.sh` (installs `passt`, which provides pasta) and tests; on by default, fail-closed, and validated end-to-end on a rootless `--network=host` host **and in a bridge/NAT container** — the gateway-collision and nested-pasta paths are proven (the gateway is pinned on-link before the RFC1918 blackhole, so egress works while RFC1918 and the same subnet stay blocked). The shadow/`install.sh`/tests implementation landed on adoption (2026-06-18). - **Still optional/open:** only the jail-aware `/verify-sandbox` check (a future item, not yet implemented — see the verification bullet above). Core internet/RFC1918 behaviour is already proven. Live design and the feasibility probes: issue **#56** (refines #31, which is closed; #33 remains open for Cohort A).