Skip to content

Fix Sourcepoint HTML asset rewriting#793

Open
ChristianPavilonis wants to merge 2 commits into
mainfrom
fix/sourcepoint-html-asset-rewrite
Open

Fix Sourcepoint HTML asset rewriting#793
ChristianPavilonis wants to merge 2 commits into
mainfrom
fix/sourcepoint-html-asset-rewrite

Conversation

@ChristianPavilonis

Copy link
Copy Markdown
Collaborator

Summary

  • Fix Sourcepoint privacy-manager iframe asset loading by rewriting root-relative HTML asset URLs through the first-party Sourcepoint proxy.
  • Add debug-only Sourcepoint tools to clear local consent state and force hidden banner containers visible during local testing.
  • Document the new debug flags and first-party proxy behavior.

Changes

File Change
crates/trusted-server-core/src/integrations/sourcepoint.rs Rewrites Sourcepoint HTML src/href root-relative assets to /integrations/sourcepoint/cdn/...; adds debug-only debug_clear_state_on_load and debug_force_banner_visible config and injector scripts; adds unit coverage.
crates/js/lib/src/integrations/sourcepoint/index.ts Extends the injected Sourcepoint config type with the debug flags.
docs/guide/integrations/sourcepoint.md Documents the HTML asset rewrite behavior and debug-only Sourcepoint flags.

Closes

Closes #792

Test plan

  • cargo test --workspace
  • cargo clippy --workspace --all-targets --all-features -- -D warnings
  • cargo fmt --all -- --check
  • JS tests: cd crates/js/lib && npx vitest run
  • JS format: cd crates/js/lib && npm run format
  • Docs format: cd docs && npm run format
  • WASM build: cargo build --package trusted-server-adapter-fastly --release --target wasm32-wasip1
  • Manual testing via fastly compute serve
  • Other: verified Sourcepoint iframe assets now return 200 through /integrations/sourcepoint/cdn/...; verified debug flags expose config and force-created banner containers visible locally.

Checklist

  • Changes follow CLAUDE.md conventions
  • No unwrap() in production code — use expect("should ...")
  • Uses tracing macros (not println!)
  • New code has tests
  • No secrets or credentials committed

@prk-Jr prk-Jr left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Solid, well-tested change. Adds root-relative HTML asset rewriting so Sourcepoint privacy-manager iframes load their CSS/JS through the first-party proxy, plus two default-off, documented debug flags (debug_clear_state_on_load, debug_force_banner_visible). Security hygiene is strong: SSRF-pinned cdn_origin, cookie allowlist, 5 MB body cap, and Accept-Encoding: identity preflight. Every new branch is exercised by a test.

No blocking findings — all comments below are non-blocking observations and follow-up suggestions. Verdict: comment.

None of the findings are expressed as one-click GitHub suggestion blocks: the debug-tooling items are design judgment calls (better left to the author than auto-patched with invented constants), and the one refactor spans two diff hunks so it can't anchor to a single range.

Non-blocking

🤔 thinking / ♻️ refactor / 📝 note

  • HTML rewrite preflight asymmetry — see inline at crates/trusted-server-core/src/integrations/sourcepoint.rs:998
  • debug_clear_state matcher misses allowlisted cookies — see inline at crates/trusted-server-core/src/integrations/sourcepoint.rs:654
  • HTML asset regex also matches inline JS / data-* — see inline at crates/trusted-server-core/src/integrations/sourcepoint.rs:121
  • debug_force_banner leaves a permanent MutationObserver — see inline at crates/trusted-server-core/src/integrations/sourcepoint.rs:719
  • New TS type fields are never read — see inline at crates/js/lib/src/integrations/sourcepoint/index.ts:9

Cross-cutting / body-level findings

  • ♻️ Duplicate body-guard logic between the JS and HTML rewrite branches — the content-length cap check + collect_response_bounded + String::from_utf8 decode block at sourcepoint.rs:937-989 and sourcepoint.rs:1002-1054 is ~50 lines of near-identical code. Extracting a read_rewritable_body(response, path, forwarded_cookies) helper that returns either the decoded (parts, String) or the pass-through Response would remove the duplication. Apply manually — it spans two hunks and restructures the early returns, so it can't be a single-range suggestion.
  • 🌱 Compile-time gating for the debug flags — both flags are runtime config, so a misconfigured trusted-server.toml can ship the debug scripts to production despite the doc warning. A #[cfg(debug_assertions)] or env gate would make accidental production enablement impossible. Future consideration, not for this PR.
  • 👍 Praise — SSRF pinning of cdn_origin to the exact host with path/query/fragment rejection (and the rationale comment) is excellent; redirect Location rewriting, cookie-safety cache policy, and Vary: Accept-Encoding stripping are all correct and tested; debug-script ordering (clear-state before force-visible, both relative to the _sp_ trap) is explicitly covered.

CI Status

  • cargo fmt: PASS
  • Analyze (rust): PASS
  • cargo test: PASS
  • vitest: PASS
  • format-typescript: PASS
  • format-docs: PASS
  • CodeQL: PASS
  • Analyze (javascript-typescript): PASS
  • Analyze (actions): PASS
  • integration tests: PASS
  • integration tests (EdgeZero entry point): PASS
  • browser integration tests: PASS
  • prepare integration artifacts: PASS

if method == Method::GET
&& response.status() == StatusCode::OK
&& self.config.rewrite_sdk
&& Self::is_html_response(&response)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 thinking — HTML rewrite fires on is_html_response (Content-Type), but the Accept-Encoding: identity preflight that makes the body readable fires on is_likely_html_path (.html / .htm only, see line 862-865). A privacy-manager served from an extensionless directory path (e.g. /us_pm/) that returns text/html would arrive gzip-compressed → String::from_utf8 fails → silent pass-through with no rewrite.

The PR's documented target path (/us_pm/index.html) ends in .html, so it works today; this only bites extensionless entry points. There's no corruption (header + body stay consistent on the utf8-fail path), just a silent no-op. Worth a follow-up — either decompress the body or broaden the path heuristic so the preflight set matches the rewrite set.

"if(window.console&&console.warn)console.warn(\"Sourcepoint debug_clear_state_on_load enabled; clearing Sourcepoint consent state\");",
"function m(n){",
"n=String(n||\"\").toLowerCase();",
"return n.indexOf(\"_sp\")!==-1||n.indexOf(\"sp_\")!==-1||n.indexOf(\"sourcepoint\")!==-1||n.indexOf(\"consent\")!==-1||n.indexOf(\"gpp\")!==-1||n.indexOf(\"usnat\")!==-1||n.indexOf(\"privacy\")!==-1||n.indexOf(\"ccpa\")!==-1;",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 thinking — This substring matcher (_sp / sp_ / sourcepoint / consent / gpp / usnat / privacy / ccpa) does not match dnsDisplayed, signedLspa, or globalcmpUUID — all three are in SOURCEPOINT_COOKIE_ALLOWLIST. A tool that "clears Sourcepoint consent state" therefore leaves those cookies/keys behind. Conversely it does match Trusted Server's own __gpp / _ts_gpp_src (likely intended for a full reset).

Not prescribing an exact token set — which names count as "Sourcepoint state" is your call — but flagging the drift between this matcher and the allowlist so the debug reset is intentional rather than accidental. Debug-only, so non-blocking.

/// those paths must keep flowing through `/integrations/sourcepoint/cdn` rather
/// than resolving against the publisher origin root.
static SP_ROOT_RELATIVE_ASSET_ATTR_PATTERN: LazyLock<Regex> = LazyLock::new(|| {
Regex::new(r#"(?i)(\b(?:src|href)\s*=\s*)(['"])(/[^/'"][^'"]*)(['"])"#)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 thinking — This pattern matches any src= / href= in the response body, including inline-script assignments like el.src="/foo" and attributes like data-src="/foo" (the \b lets src match right after data-). For the constrained Sourcepoint privacy-manager HTML that's acceptable, but it's the usual fragility tradeoff of regex-based HTML rewriting — a non-asset data-href or a JS string literal could be rewritten unintentionally.

Protocol-relative (//…) and fragment (#…) values are correctly excluded by (/[^/'"]…), and that's covered by tests. Note only — no change required for this PR.

"for(var i=0;i<muts.length;i++){",
"for(var j=0;j<muts[i].addedNodes.length;j++)scan(muts[i].addedNodes[j]);",
"}",
"}).observe(document.documentElement,{childList:true,subtree:true});",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 note — The polling loop self-terminates after 80 × 250 ms (~20 s), but this MutationObserver on document.documentElement with subtree: true runs for the lifetime of the page. Since it's debug-only that's tolerable, but consider disconnecting it when the polling window ends (e.g. call observer.disconnect() in the same branch as clearInterval(timer)) so the debug flag doesn't leave a permanent whole-document observer running.

Comment on lines +8 to +9
debugForceBannerVisible?: boolean;
debugClearStateOnLoad?: boolean;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 thinkingdebugForceBannerVisible and debugClearStateOnLoad are added to the __tsjs_sourcepoint window type, but no TypeScript reads them — the entire debug behavior is implemented by the Rust-injected inline scripts. That's fine as config-shape documentation, but worth confirming it's intentional rather than a half-wired feature (e.g. behavior that was meant to live in script_guard.ts but ended up Rust-side). If they're purely documentation, a short comment saying so would prevent a future reader from assuming the TS layer consumes them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sourcepoint privacy manager assets 404 through first-party proxy

2 participants