Summary
findata_run_code (MCP code-mode) executes arbitrary Python in a child interpreter with full library, filesystem, and network access, and inherits the server process environment. It is a deliberate prototype, gated off by default (FINDATA_MCP_CODE_MODE) and documented as "not a hardened sandbox" in docs/MCP_SURFACE.md. This issue tracks the concrete work needed before the gate can be flipped on anywhere other than a single-user local machine.
Scope: harden the existing tool. Not in scope: removing it, or changing the import-time gating (that the tool is absent from the catalog when the env var is unset is intentional, not a bug).
Current behavior (as implemented)
Source: src/findata/api/mcp_app.py:805-894.
- Gate.
_CODE_MODE_ENABLED = os.getenv("FINDATA_MCP_CODE_MODE") truthy in {1, true, yes, on} (case-insensitive, stripped). When false, the POST /run-code route is never registered, so the tool does not appear in the MCP catalog at all (mcp_app.py:807-812, 869).
- Constants.
_CODE_OUTPUT_CAP = 20_000 chars, _CODE_TIMEOUT_MAX = 120 s. Request default timeout 30 s, clamped to 1..120 (mcp_app.py:813-827, 840).
- Execution.
asyncio.create_subprocess_exec(sys.executable, "-I", "-c", code, stdout=PIPE, stderr=STDOUT, cwd=tempfile.gettempdir(), start_new_session=True) (mcp_app.py:841-850).
-I (isolated): ignores PYTHON* env vars and user site-packages, does not put cwd on sys.path. The installed findata package stays importable (it lives in the interpreter's site-packages).
start_new_session=True: child is its own session/process-group leader.
- Wait / kill.
asyncio.wait_for(proc.communicate(), timeout). On TimeoutError: os.killpg(os.getpgid(proc.pid), SIGKILL) under contextlib.suppress(ProcessLookupError, PermissionError), then await proc.wait(); returns {timed_out: True, exit_code: None, output: "(killed: exceeded Ns)"} (mcp_app.py:851-859).
- Result. decode utf-8 (
errors="replace"), return {timed_out: False, exit_code, truncated: len > cap, output: text[:cap]} (mcp_app.py:860-866).
Why this matters (threat model)
The risk is entirely a function of who can reach /mcp and where the server runs:
- Single-user localhost, trusted agent (the documented intended use): acceptable as-is. The operator can already run arbitrary code on their own box.
- Any shared, multi-tenant, or internet-reachable deployment: unacceptable until the items below land. Code-mode is a remote-code-execution primitive by design; with the current setup it is also a secret-exfiltration and SSRF primitive.
The single most dangerous property is that it combines (a) full read of the server's filesystem and environment with (b) unrestricted outbound network. That is exfiltration in one tool call.
Hardening backlog (ranked)
P0-a. Child inherits the server's full environment (secret leakage)
create_subprocess_exec(...) is called with no env=, so the child inherits the parent's entire os.environ. -I does not clear the environment; it only makes the interpreter ignore PYTHON* vars and user site. A snippet of import os; print(dict(os.environ)) therefore dumps every secret in the server process env (DB creds, API keys, tokens, cloud credentials).
- Fix. Pass a minimal scrubbed allowlist, e.g.
env={"PATH": os.environ.get("PATH", ""), "HOME": <tempdir>, "LANG": "C.UTF-8"}. Drop everything else. Add a unit test asserting a representative secret-shaped var set in the parent is absent from the child's os.environ.
- Effort. Low (a few lines + test). Do this first, it is cheap and high-impact.
P0-b. Unrestricted outbound network (exfiltration / SSRF)
The child can reach any host the server can, including internal services and cloud metadata (169.254.169.254, link-local, internal DNS). Findata itself needs egress to gov APIs, so a blanket block is not viable in-process.
- Fix direction. This is a deployment-layer control, document it as a hard prerequisite: run code-mode only behind an egress allowlist (the known gov/data hosts) enforced by the network namespace / firewall / proxy, plus block link-local and metadata ranges. Optionally support an env-configured allowlist that the server refuses to start code-mode without.
- Effort. Medium, and partly ops, not code. Treat as a gating prerequisite rather than a code patch.
P1-a. Output buffered in full before the cap (parent OOM)
proc.communicate() reads the entire child stdout into memory before text[:_CODE_OUTPUT_CAP] is applied. A snippet that prints unbounded output (print("x" * 10**10), or a tight while True: print(...) loop) can OOM the parent server before the 20k cap or the wall-clock timeout ever fire. This is the limitation already noted in the _execute_code docstring.
- Fix direction. Bounded streaming drain: read from
proc.stdout in chunks up to a hard byte ceiling (cap + small margin), then killpg(SIGKILL) the group and mark truncated. Caution: a previous naive "stop reading" attempt deadlocked (stopped reader + full OS pipe + child blocked on write). The drain must kill the child once the ceiling is hit, not merely stop consuming. Add a test that a high-volume printer is truncated without unbounded parent memory growth.
- Effort. Medium.
P1-b. No resource limits on the child (CPU / memory / disk / FDs / procs)
No resource.setrlimit is applied. The child can allocate unbounded RAM, fill tempdir (RLIMIT_FSIZE), open unlimited FDs, or fork-bomb (the own-process-group helps cleanup but not prevention). The only limit today is wall-clock, not CPU-time.
- Fix direction. Use the
preexec_fn / post-fork hook to set RLIMIT_AS (address space), RLIMIT_CPU, RLIMIT_FSIZE, RLIMIT_NOFILE, RLIMIT_NPROC. Note preexec_fn is not available with asyncio.create_subprocess_exec the same way as subprocess; may require a small wrapper script (sys.executable -I wrapper.py) that calls setrlimit then execs the snippet, or switching to loop.subprocess_exec with a custom protocol. Pick one approach and document the tradeoff.
- Effort. Medium to high (the asyncio API friction is the cost, not the rlimits themselves).
P1-c. No concurrency cap (host exhaustion DoS)
Every call spawns an interpreter; N concurrent MCP calls spawn N interpreters with no semaphore. Combined with P1-b (no per-child memory cap) a handful of concurrent calls can exhaust the host.
- Fix. Module-level
asyncio.Semaphore(_CODE_MAX_CONCURRENCY); return a 429/structured "busy" result when saturated. Add a test.
- Effort. Low.
P1-d. Full filesystem read/write as the server user
cwd is a tempdir but nothing confines the child there. It can read anything the server uid can (source tree, the venv, ~/.aws, mounted secrets) and write anywhere writable.
- Fix direction. Mostly sandbox-layer (container with a read-only rootfs + a small writable tmpfs, or a dedicated low-privilege user, or seccomp).
RLIMIT_FSIZE from P1-b caps write volume but not reach. Document as a deployment prerequisite alongside P0-b.
- Effort. Ops-heavy.
P2. No audit trail
Arbitrary code executes with zero logging of the submitted source, caller, timing, or outcome. Any shared deployment needs an audit log (at least: hash of the code, timestamp, duration, exit/timeout/truncated flags; ideally the full source to a restricted sink).
P3 (notes, not necessarily actionable)
- stderr merged into stdout (
stderr=STDOUT): tracebacks interleave with results. Fine for an agent, but if structured error reporting is wanted later, split the streams.
- POSIX-only kill path (
os.killpg/os.getpgid/start_new_session/signals): documented assumption that the server runs on POSIX. Note it; do not invest in Windows support unless a target needs it.
Already handled, do not regress
- Process-group kill on timeout (
start_new_session=True + killpg(getpgid, SIGKILL)): correctly reaps grandchildren so a snippet that spawns its own subprocess cannot orphan it past the timeout. This was deliberately chosen over a bare proc.kill() and verified (a sleep grandchild died via the group). Keep it.
communicate() over a manual read loop: the manual loop deadlocked on a full pipe; communicate() is the correct primitive. The P1-a fix must preserve deadlock-freedom (drain-and-kill, never stop-and-wait).
- Import-time gating: tool absent from the catalog when the env var is unset is intentional (keeps the surface clean and fail-safe), not a bug.
Recommended minimum bar to flip the gate on
Even for "trusted local/agent" use, land the cheap wins: P0-a (scrub env), P1-a (bounded read), P1-b (rlimits), P1-c (concurrency cap). For any shared or network-reachable host, additionally require the deployment-layer controls: P0-b (egress allowlist + metadata block) and P1-d (filesystem confinement via container/seccomp/dedicated user). Until P0-b and P1-d exist, the docs should keep stating, in stronger terms, "localhost trusted use only".
Acceptance criteria
References
Summary
findata_run_code(MCP code-mode) executes arbitrary Python in a child interpreter with full library, filesystem, and network access, and inherits the server process environment. It is a deliberate prototype, gated off by default (FINDATA_MCP_CODE_MODE) and documented as "not a hardened sandbox" indocs/MCP_SURFACE.md. This issue tracks the concrete work needed before the gate can be flipped on anywhere other than a single-user local machine.Scope: harden the existing tool. Not in scope: removing it, or changing the import-time gating (that the tool is absent from the catalog when the env var is unset is intentional, not a bug).
Current behavior (as implemented)
Source:
src/findata/api/mcp_app.py:805-894._CODE_MODE_ENABLED = os.getenv("FINDATA_MCP_CODE_MODE")truthy in{1, true, yes, on}(case-insensitive, stripped). When false, thePOST /run-coderoute is never registered, so the tool does not appear in the MCP catalog at all (mcp_app.py:807-812, 869)._CODE_OUTPUT_CAP = 20_000chars,_CODE_TIMEOUT_MAX = 120s. Request default timeout 30 s, clamped to1..120(mcp_app.py:813-827, 840).asyncio.create_subprocess_exec(sys.executable, "-I", "-c", code, stdout=PIPE, stderr=STDOUT, cwd=tempfile.gettempdir(), start_new_session=True)(mcp_app.py:841-850).-I(isolated): ignoresPYTHON*env vars and user site-packages, does not put cwd onsys.path. The installedfindatapackage stays importable (it lives in the interpreter's site-packages).start_new_session=True: child is its own session/process-group leader.asyncio.wait_for(proc.communicate(), timeout). OnTimeoutError:os.killpg(os.getpgid(proc.pid), SIGKILL)undercontextlib.suppress(ProcessLookupError, PermissionError), thenawait proc.wait(); returns{timed_out: True, exit_code: None, output: "(killed: exceeded Ns)"}(mcp_app.py:851-859).errors="replace"), return{timed_out: False, exit_code, truncated: len > cap, output: text[:cap]}(mcp_app.py:860-866).Why this matters (threat model)
The risk is entirely a function of who can reach
/mcpand where the server runs:The single most dangerous property is that it combines (a) full read of the server's filesystem and environment with (b) unrestricted outbound network. That is exfiltration in one tool call.
Hardening backlog (ranked)
P0-a. Child inherits the server's full environment (secret leakage)
create_subprocess_exec(...)is called with noenv=, so the child inherits the parent's entireos.environ.-Idoes not clear the environment; it only makes the interpreter ignorePYTHON*vars and user site. A snippet ofimport os; print(dict(os.environ))therefore dumps every secret in the server process env (DB creds, API keys, tokens, cloud credentials).env={"PATH": os.environ.get("PATH", ""), "HOME": <tempdir>, "LANG": "C.UTF-8"}. Drop everything else. Add a unit test asserting a representative secret-shaped var set in the parent is absent from the child'sos.environ.P0-b. Unrestricted outbound network (exfiltration / SSRF)
The child can reach any host the server can, including internal services and cloud metadata (
169.254.169.254, link-local, internal DNS). Findata itself needs egress to gov APIs, so a blanket block is not viable in-process.P1-a. Output buffered in full before the cap (parent OOM)
proc.communicate()reads the entire child stdout into memory beforetext[:_CODE_OUTPUT_CAP]is applied. A snippet that prints unbounded output (print("x" * 10**10), or a tightwhile True: print(...)loop) can OOM the parent server before the 20k cap or the wall-clock timeout ever fire. This is the limitation already noted in the_execute_codedocstring.proc.stdoutin chunks up to a hard byte ceiling (cap + small margin), thenkillpg(SIGKILL)the group and marktruncated. Caution: a previous naive "stop reading" attempt deadlocked (stopped reader + full OS pipe + child blocked on write). The drain must kill the child once the ceiling is hit, not merely stop consuming. Add a test that a high-volume printer is truncated without unbounded parent memory growth.P1-b. No resource limits on the child (CPU / memory / disk / FDs / procs)
No
resource.setrlimitis applied. The child can allocate unbounded RAM, filltempdir(RLIMIT_FSIZE), open unlimited FDs, or fork-bomb (the own-process-group helps cleanup but not prevention). The only limit today is wall-clock, not CPU-time.preexec_fn/ post-fork hook to setRLIMIT_AS(address space),RLIMIT_CPU,RLIMIT_FSIZE,RLIMIT_NOFILE,RLIMIT_NPROC. Notepreexec_fnis not available withasyncio.create_subprocess_execthe same way assubprocess; may require a small wrapper script (sys.executable -I wrapper.py) that callssetrlimitthenexecs the snippet, or switching toloop.subprocess_execwith a custom protocol. Pick one approach and document the tradeoff.P1-c. No concurrency cap (host exhaustion DoS)
Every call spawns an interpreter; N concurrent MCP calls spawn N interpreters with no semaphore. Combined with P1-b (no per-child memory cap) a handful of concurrent calls can exhaust the host.
asyncio.Semaphore(_CODE_MAX_CONCURRENCY); return a429/structured "busy" result when saturated. Add a test.P1-d. Full filesystem read/write as the server user
cwdis a tempdir but nothing confines the child there. It can read anything the server uid can (source tree, the venv,~/.aws, mounted secrets) and write anywhere writable.RLIMIT_FSIZEfrom P1-b caps write volume but not reach. Document as a deployment prerequisite alongside P0-b.P2. No audit trail
Arbitrary code executes with zero logging of the submitted source, caller, timing, or outcome. Any shared deployment needs an audit log (at least: hash of the code, timestamp, duration, exit/timeout/truncated flags; ideally the full source to a restricted sink).
P3 (notes, not necessarily actionable)
stderr=STDOUT): tracebacks interleave with results. Fine for an agent, but if structured error reporting is wanted later, split the streams.os.killpg/os.getpgid/start_new_session/signals): documented assumption that the server runs on POSIX. Note it; do not invest in Windows support unless a target needs it.Already handled, do not regress
start_new_session=True+killpg(getpgid, SIGKILL)): correctly reaps grandchildren so a snippet that spawns its own subprocess cannot orphan it past the timeout. This was deliberately chosen over a bareproc.kill()and verified (asleepgrandchild died via the group). Keep it.communicate()over a manual read loop: the manual loop deadlocked on a full pipe;communicate()is the correct primitive. The P1-a fix must preserve deadlock-freedom (drain-and-kill, never stop-and-wait).Recommended minimum bar to flip the gate on
Even for "trusted local/agent" use, land the cheap wins: P0-a (scrub env), P1-a (bounded read), P1-b (rlimits), P1-c (concurrency cap). For any shared or network-reachable host, additionally require the deployment-layer controls: P0-b (egress allowlist + metadata block) and P1-d (filesystem confinement via container/seccomp/dedicated user). Until P0-b and P1-d exist, the docs should keep stating, in stronger terms, "localhost trusted use only".
Acceptance criteria
RLIMIT_AS,RLIMIT_CPU,RLIMIT_FSIZE,RLIMIT_NOFILE,RLIMIT_NPROCapplied to the child; a snippet exceeding each is killed/limited (tests).docs/MCP_SURFACE.md"Code mode: security" updated with the new in-process limits and the explicit deployment prerequisites (egress allowlist, metadata block, filesystem confinement) before enabling on a shared host.References
src/findata/api/mcp_app.py:805-894docs/MCP_SURFACE.md("Code mode: security")