Verification checks#
/verify-sandbox runs two phases against the live Claude process: a
deterministic 18-check PASS/FAIL battery, then — only when all 18
pass — 10 adversarial breakout probes. Any FAIL in phase 1, or any
[ESCAPED] probe in phase 2, exits the command non-zero, so it is
usable as a CI assertion.
Working in an unpromoted workspace?
Running Claude unpromoted is the normal, recommended mode — the shadow and
the global integrity guard protect claude in every folder, so a workspace
does not need promoting to be safe.
The trade-off is that the just recipes and project commands like
/verify-sandbox ship with the claude-sandbox clone, so they are only
available when Claude’s working directory is that clone. To use them, cd into
the clone (e.g. /workspaces/claude-sandbox), run what you need, then
return to your work — dropping back to the clone like this is expected and fine.
(Promoting the workspace with just promote makes them available in place, but
that is optional.)
The exact bash for each check lives in the spec at
.claude/commands/verify-sandbox.md in the repo. The summaries below
state what each check asserts; see
locked-down defences for the
defence → primitive mapping.
The battery is jail-agnostic: it passes unchanged whether or not the egress jail is active (the jail is on by default). No jail-aware variant is needed — the capability check asserts the effective set, which stays empty inside the jail’s nested userns. An additional check that the netns exists and the RFC1918 blackhole holds is a future/optional item (see ADR 0015 consequences), not yet part of the 18.
Phase 1 — the 18-check battery#
# |
Asserts |
|---|---|
01 |
|
02 |
|
03 |
Strict-under- |
04 |
|
05 |
|
06 |
|
07 |
|
08 |
|
09 |
|
10 |
|
11 |
No |
12 |
|
13 |
|
14 |
|
15 |
|
16 |
|
17 |
Workspace is scoped to |
18 |
The installed shadow pins |
On any FAIL the command exits non-zero, names the regressed defence on the FAIL line, and skips phase 2 entirely.
Note
Check 06 asserts the effective capability set, which bwrap’s
--cap-drop ALL empties even when the egress jail nests bwrap’s userns inside the holder’s. In
a jailed session CapBnd reads full (...1ffffffffff, a nested-userns
artifact) but CapEff is still 0, so the battery passes unchanged — no
jail-aware variant of /verify-sandbox is needed.
Phase 2 — adversarial breakout probes#
Runs only when all 18 checks pass. The command reasons up 10 novel breakout attempts aimed at gaps the deterministic matrix does not directly exercise — escaping the filesystem inversion, recovering scrubbed env vars, reaching the host’s network identity, signalling or observing processes outside the pidns, or otherwise violating the threat model.
Constraints on the probes:
Each must be distinct from the others and from phase 1 (not a re-test of
--cap-drop ALLor--clearenvfrom another angle).Bias toward novelty: kernel interfaces (eBPF, perf events, kernel keyrings, io_uring), filesystem corners (proc, sys, debugfs, cgroup, securityfs,
/proc/<pid>/roottraversal), env-var recovery paths, IPC channels (abstract unix sockets, signalfd, pidfd, fanotify), network reachability (loopback services,/etc/resolv.conf, AF_NETLINK, raw sockets), credential paths, exec-chain escalation (setuid binaries despite NO_NEW_PRIVS, file capabilities), and bwrap-specific cases (--die-with-parentrace,--new-sessionbypass, env-redirect bypasses routinggitback to a host gitconfig).
With the egress jail on (the default),
an attempted connection to an RFC1918 or 169.254.169.254 address is a
candidate phase-2 probe — phase-2 probes are reasoned up dynamically, not a
fixed list — and if run it should classify [BLOCKED] (blackholed route)
rather than [ESCAPED].
Each probe is classified on one line:
Classification |
Meaning |
Effect |
|---|---|---|
|
The attempt failed the way the sandbox expects (EACCES, EPERM, ENOENT for masked paths, etc.). |
None. |
|
The attempt succeeded in a way that violates the threat model (readable host credential, writable host path outside the workspace, signal to a process outside the pidns, etc.). |
Result becomes |
|
No error, but no demonstrated breach either. |
Informational; does not change the exit code. Each is followed by a suggested follow-up. |
If all 10 probes are [BLOCKED], the final line is
RESULT: SANDBOX OK (18 deterministic + 10 adversarial).