Fix Sourcepoint HTML asset rewriting by ChristianPavilonis · Pull Request #793 · IABTechLab/trusted-server

ChristianPavilonis · 2026-06-22T15:41:35Z

Summary

Fix Sourcepoint privacy-manager iframe asset loading by rewriting root-relative HTML asset URLs through the first-party Sourcepoint proxy.
Add debug-only Sourcepoint tools to clear local consent state and force hidden banner containers visible during local testing.
Document the new debug flags and first-party proxy behavior.

Changes

File	Change
`crates/trusted-server-core/src/integrations/sourcepoint.rs`	Rewrites Sourcepoint HTML `src`/`href` root-relative assets to `/integrations/sourcepoint/cdn/...`; adds debug-only `debug_clear_state_on_load` and `debug_force_banner_visible` config and injector scripts; adds unit coverage.
`crates/js/lib/src/integrations/sourcepoint/index.ts`	Extends the injected Sourcepoint config type with the debug flags.
`docs/guide/integrations/sourcepoint.md`	Documents the HTML asset rewrite behavior and debug-only Sourcepoint flags.

Closes

Closes #792

Test plan

cargo test --workspace
cargo clippy --workspace --all-targets --all-features -- -D warnings
cargo fmt --all -- --check
JS tests: cd crates/js/lib && npx vitest run
JS format: cd crates/js/lib && npm run format
Docs format: cd docs && npm run format
WASM build: cargo build --package trusted-server-adapter-fastly --release --target wasm32-wasip1
Manual testing via fastly compute serve
Other: verified Sourcepoint iframe assets now return 200 through /integrations/sourcepoint/cdn/...; verified debug flags expose config and force-created banner containers visible locally.

Checklist

Changes follow CLAUDE.md conventions
No unwrap() in production code — use expect("should ...")
Uses tracing macros (not println!)
New code has tests
No secrets or credentials committed

prk-Jr

Summary

Solid, well-tested change. Adds root-relative HTML asset rewriting so Sourcepoint privacy-manager iframes load their CSS/JS through the first-party proxy, plus two default-off, documented debug flags (debug_clear_state_on_load, debug_force_banner_visible). Security hygiene is strong: SSRF-pinned cdn_origin, cookie allowlist, 5 MB body cap, and Accept-Encoding: identity preflight. Every new branch is exercised by a test.

No blocking findings — all comments below are non-blocking observations and follow-up suggestions. Verdict: comment.

None of the findings are expressed as one-click GitHub suggestion blocks: the debug-tooling items are design judgment calls (better left to the author than auto-patched with invented constants), and the one refactor spans two diff hunks so it can't anchor to a single range.

Non-blocking

🤔 thinking / ♻️ refactor / 📝 note

HTML rewrite preflight asymmetry — see inline at crates/trusted-server-core/src/integrations/sourcepoint.rs:998
debug_clear_state matcher misses allowlisted cookies — see inline at crates/trusted-server-core/src/integrations/sourcepoint.rs:654
HTML asset regex also matches inline JS / data-* — see inline at crates/trusted-server-core/src/integrations/sourcepoint.rs:121
debug_force_banner leaves a permanent MutationObserver — see inline at crates/trusted-server-core/src/integrations/sourcepoint.rs:719
New TS type fields are never read — see inline at crates/js/lib/src/integrations/sourcepoint/index.ts:9

Cross-cutting / body-level findings

♻️ Duplicate body-guard logic between the JS and HTML rewrite branches — the content-length cap check + collect_response_bounded + String::from_utf8 decode block at sourcepoint.rs:937-989 and sourcepoint.rs:1002-1054 is ~50 lines of near-identical code. Extracting a read_rewritable_body(response, path, forwarded_cookies) helper that returns either the decoded (parts, String) or the pass-through Response would remove the duplication. Apply manually — it spans two hunks and restructures the early returns, so it can't be a single-range suggestion.
🌱 Compile-time gating for the debug flags — both flags are runtime config, so a misconfigured trusted-server.toml can ship the debug scripts to production despite the doc warning. A #[cfg(debug_assertions)] or env gate would make accidental production enablement impossible. Future consideration, not for this PR.
👍 Praise — SSRF pinning of cdn_origin to the exact host with path/query/fragment rejection (and the rationale comment) is excellent; redirect Location rewriting, cookie-safety cache policy, and Vary: Accept-Encoding stripping are all correct and tested; debug-script ordering (clear-state before force-visible, both relative to the _sp_ trap) is explicitly covered.

CI Status

cargo fmt: PASS
Analyze (rust): PASS
cargo test: PASS
vitest: PASS
format-typescript: PASS
format-docs: PASS
CodeQL: PASS
Analyze (javascript-typescript): PASS
Analyze (actions): PASS
integration tests: PASS
integration tests (EdgeZero entry point): PASS
browser integration tests: PASS
prepare integration artifacts: PASS

prk-Jr · 2026-06-24T10:01:18Z

+        if method == Method::GET
+            && response.status() == StatusCode::OK
+            && self.config.rewrite_sdk
+            && Self::is_html_response(&response)


🤔 thinking — HTML rewrite fires on is_html_response (Content-Type), but the Accept-Encoding: identity preflight that makes the body readable fires on is_likely_html_path (.html / .htm only, see line 862-865). A privacy-manager served from an extensionless directory path (e.g. /us_pm/) that returns text/html would arrive gzip-compressed → String::from_utf8 fails → silent pass-through with no rewrite.

The PR's documented target path (/us_pm/index.html) ends in .html, so it works today; this only bites extensionless entry points. There's no corruption (header + body stay consistent on the utf8-fail path), just a silent no-op. Worth a follow-up — either decompress the body or broaden the path heuristic so the preflight set matches the rewrite set.

prk-Jr · 2026-06-24T10:01:18Z

+            "if(window.console&&console.warn)console.warn(\"Sourcepoint debug_clear_state_on_load enabled; clearing Sourcepoint consent state\");",
+            "function m(n){",
+            "n=String(n||\"\").toLowerCase();",
+            "return n.indexOf(\"_sp\")!==-1||n.indexOf(\"sp_\")!==-1||n.indexOf(\"sourcepoint\")!==-1||n.indexOf(\"consent\")!==-1||n.indexOf(\"gpp\")!==-1||n.indexOf(\"usnat\")!==-1||n.indexOf(\"privacy\")!==-1||n.indexOf(\"ccpa\")!==-1;",


🤔 thinking — This substring matcher (_sp / sp_ / sourcepoint / consent / gpp / usnat / privacy / ccpa) does not match dnsDisplayed, signedLspa, or globalcmpUUID — all three are in SOURCEPOINT_COOKIE_ALLOWLIST. A tool that "clears Sourcepoint consent state" therefore leaves those cookies/keys behind. Conversely it does match Trusted Server's own __gpp / _ts_gpp_src (likely intended for a full reset).

Not prescribing an exact token set — which names count as "Sourcepoint state" is your call — but flagging the drift between this matcher and the allowlist so the debug reset is intentional rather than accidental. Debug-only, so non-blocking.

prk-Jr · 2026-06-24T10:01:18Z

+/// those paths must keep flowing through `/integrations/sourcepoint/cdn` rather
+/// than resolving against the publisher origin root.
+static SP_ROOT_RELATIVE_ASSET_ATTR_PATTERN: LazyLock<Regex> = LazyLock::new(|| {
+    Regex::new(r#"(?i)(\b(?:src|href)\s*=\s*)(['"])(/[^/'"][^'"]*)(['"])"#)


🤔 thinking — This pattern matches any src= / href= in the response body, including inline-script assignments like el.src="/foo" and attributes like data-src="/foo" (the \b lets src match right after data-). For the constrained Sourcepoint privacy-manager HTML that's acceptable, but it's the usual fragility tradeoff of regex-based HTML rewriting — a non-asset data-href or a JS string literal could be rewritten unintentionally.

Protocol-relative (//…) and fragment (#…) values are correctly excluded by (/[^/'"]…), and that's covered by tests. Note only — no change required for this PR.

prk-Jr · 2026-06-24T10:01:18Z

+            "for(var i=0;i<muts.length;i++){",
+            "for(var j=0;j<muts[i].addedNodes.length;j++)scan(muts[i].addedNodes[j]);",
+            "}",
+            "}).observe(document.documentElement,{childList:true,subtree:true});",


📝 note — The polling loop self-terminates after 80 × 250 ms (~20 s), but this MutationObserver on document.documentElement with subtree: true runs for the lifetime of the page. Since it's debug-only that's tolerable, but consider disconnecting it when the polling window ends (e.g. call observer.disconnect() in the same branch as clearInterval(timer)) so the debug flag doesn't leave a permanent whole-document observer running.

prk-Jr · 2026-06-24T10:01:18Z

+    debugForceBannerVisible?: boolean;
+    debugClearStateOnLoad?: boolean;


🤔 thinking — debugForceBannerVisible and debugClearStateOnLoad are added to the __tsjs_sourcepoint window type, but no TypeScript reads them — the entire debug behavior is implemented by the Rust-injected inline scripts. That's fine as config-shape documentation, but worth confirming it's intentional rather than a half-wired feature (e.g. behavior that was meant to live in script_guard.ts but ended up Rust-side). If they're purely documentation, a short comment saying so would prevent a future reader from assuming the TS layer consumes them.

ChristianPavilonis added 2 commits June 22, 2026 10:40

Fix Sourcepoint HTML asset rewriting and debug banner tools

574aa6a

Merge branch 'main' into fix/sourcepoint-html-asset-rewrite

f02cf02

ChristianPavilonis requested review from aram356 and prk-Jr June 22, 2026 15:43

ChristianPavilonis self-assigned this Jun 22, 2026

prk-Jr reviewed Jun 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Sourcepoint HTML asset rewriting#793

Fix Sourcepoint HTML asset rewriting#793
ChristianPavilonis wants to merge 2 commits into
mainfrom
fix/sourcepoint-html-asset-rewrite

ChristianPavilonis commented Jun 22, 2026

Uh oh!

prk-Jr left a comment

Uh oh!

prk-Jr Jun 24, 2026

Uh oh!

prk-Jr Jun 24, 2026

Uh oh!

prk-Jr Jun 24, 2026

Uh oh!

prk-Jr Jun 24, 2026

Uh oh!

prk-Jr Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		debugForceBannerVisible?: boolean;
		debugClearStateOnLoad?: boolean;

Conversation

ChristianPavilonis commented Jun 22, 2026

Summary

Changes

Closes

Test plan

Checklist

Uh oh!

prk-Jr left a comment

Choose a reason for hiding this comment

Summary

Non-blocking

🤔 thinking / ♻️ refactor / 📝 note

Cross-cutting / body-level findings

CI Status

Uh oh!

prk-Jr Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

prk-Jr Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

prk-Jr Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

prk-Jr Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

prk-Jr Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants