Fix Sourcepoint HTML asset rewriting#793
Conversation
prk-Jr
left a comment
There was a problem hiding this comment.
Summary
Solid, well-tested change. Adds root-relative HTML asset rewriting so Sourcepoint privacy-manager iframes load their CSS/JS through the first-party proxy, plus two default-off, documented debug flags (debug_clear_state_on_load, debug_force_banner_visible). Security hygiene is strong: SSRF-pinned cdn_origin, cookie allowlist, 5 MB body cap, and Accept-Encoding: identity preflight. Every new branch is exercised by a test.
No blocking findings — all comments below are non-blocking observations and follow-up suggestions. Verdict: comment.
None of the findings are expressed as one-click GitHub suggestion blocks: the debug-tooling items are design judgment calls (better left to the author than auto-patched with invented constants), and the one refactor spans two diff hunks so it can't anchor to a single range.
Non-blocking
🤔 thinking / ♻️ refactor / 📝 note
- HTML rewrite preflight asymmetry — see inline at
crates/trusted-server-core/src/integrations/sourcepoint.rs:998 debug_clear_statematcher misses allowlisted cookies — see inline atcrates/trusted-server-core/src/integrations/sourcepoint.rs:654- HTML asset regex also matches inline JS /
data-*— see inline atcrates/trusted-server-core/src/integrations/sourcepoint.rs:121 debug_force_bannerleaves a permanent MutationObserver — see inline atcrates/trusted-server-core/src/integrations/sourcepoint.rs:719- New TS type fields are never read — see inline at
crates/js/lib/src/integrations/sourcepoint/index.ts:9
Cross-cutting / body-level findings
- ♻️ Duplicate body-guard logic between the JS and HTML rewrite branches — the content-length cap check +
collect_response_bounded+String::from_utf8decode block atsourcepoint.rs:937-989andsourcepoint.rs:1002-1054is ~50 lines of near-identical code. Extracting aread_rewritable_body(response, path, forwarded_cookies)helper that returns either the decoded(parts, String)or the pass-throughResponsewould remove the duplication. Apply manually — it spans two hunks and restructures the early returns, so it can't be a single-range suggestion. - 🌱 Compile-time gating for the debug flags — both flags are runtime config, so a misconfigured
trusted-server.tomlcan ship the debug scripts to production despite the doc warning. A#[cfg(debug_assertions)]or env gate would make accidental production enablement impossible. Future consideration, not for this PR. - 👍 Praise — SSRF pinning of
cdn_originto the exact host with path/query/fragment rejection (and the rationale comment) is excellent; redirectLocationrewriting, cookie-safety cache policy, andVary: Accept-Encodingstripping are all correct and tested; debug-script ordering (clear-state before force-visible, both relative to the_sp_trap) is explicitly covered.
CI Status
- cargo fmt: PASS
- Analyze (rust): PASS
- cargo test: PASS
- vitest: PASS
- format-typescript: PASS
- format-docs: PASS
- CodeQL: PASS
- Analyze (javascript-typescript): PASS
- Analyze (actions): PASS
- integration tests: PASS
- integration tests (EdgeZero entry point): PASS
- browser integration tests: PASS
- prepare integration artifacts: PASS
| if method == Method::GET | ||
| && response.status() == StatusCode::OK | ||
| && self.config.rewrite_sdk | ||
| && Self::is_html_response(&response) |
There was a problem hiding this comment.
🤔 thinking — HTML rewrite fires on is_html_response (Content-Type), but the Accept-Encoding: identity preflight that makes the body readable fires on is_likely_html_path (.html / .htm only, see line 862-865). A privacy-manager served from an extensionless directory path (e.g. /us_pm/) that returns text/html would arrive gzip-compressed → String::from_utf8 fails → silent pass-through with no rewrite.
The PR's documented target path (/us_pm/index.html) ends in .html, so it works today; this only bites extensionless entry points. There's no corruption (header + body stay consistent on the utf8-fail path), just a silent no-op. Worth a follow-up — either decompress the body or broaden the path heuristic so the preflight set matches the rewrite set.
| "if(window.console&&console.warn)console.warn(\"Sourcepoint debug_clear_state_on_load enabled; clearing Sourcepoint consent state\");", | ||
| "function m(n){", | ||
| "n=String(n||\"\").toLowerCase();", | ||
| "return n.indexOf(\"_sp\")!==-1||n.indexOf(\"sp_\")!==-1||n.indexOf(\"sourcepoint\")!==-1||n.indexOf(\"consent\")!==-1||n.indexOf(\"gpp\")!==-1||n.indexOf(\"usnat\")!==-1||n.indexOf(\"privacy\")!==-1||n.indexOf(\"ccpa\")!==-1;", |
There was a problem hiding this comment.
🤔 thinking — This substring matcher (_sp / sp_ / sourcepoint / consent / gpp / usnat / privacy / ccpa) does not match dnsDisplayed, signedLspa, or globalcmpUUID — all three are in SOURCEPOINT_COOKIE_ALLOWLIST. A tool that "clears Sourcepoint consent state" therefore leaves those cookies/keys behind. Conversely it does match Trusted Server's own __gpp / _ts_gpp_src (likely intended for a full reset).
Not prescribing an exact token set — which names count as "Sourcepoint state" is your call — but flagging the drift between this matcher and the allowlist so the debug reset is intentional rather than accidental. Debug-only, so non-blocking.
| /// those paths must keep flowing through `/integrations/sourcepoint/cdn` rather | ||
| /// than resolving against the publisher origin root. | ||
| static SP_ROOT_RELATIVE_ASSET_ATTR_PATTERN: LazyLock<Regex> = LazyLock::new(|| { | ||
| Regex::new(r#"(?i)(\b(?:src|href)\s*=\s*)(['"])(/[^/'"][^'"]*)(['"])"#) |
There was a problem hiding this comment.
🤔 thinking — This pattern matches any src= / href= in the response body, including inline-script assignments like el.src="/foo" and attributes like data-src="/foo" (the \b lets src match right after data-). For the constrained Sourcepoint privacy-manager HTML that's acceptable, but it's the usual fragility tradeoff of regex-based HTML rewriting — a non-asset data-href or a JS string literal could be rewritten unintentionally.
Protocol-relative (//…) and fragment (#…) values are correctly excluded by (/[^/'"]…), and that's covered by tests. Note only — no change required for this PR.
| "for(var i=0;i<muts.length;i++){", | ||
| "for(var j=0;j<muts[i].addedNodes.length;j++)scan(muts[i].addedNodes[j]);", | ||
| "}", | ||
| "}).observe(document.documentElement,{childList:true,subtree:true});", |
There was a problem hiding this comment.
📝 note — The polling loop self-terminates after 80 × 250 ms (~20 s), but this MutationObserver on document.documentElement with subtree: true runs for the lifetime of the page. Since it's debug-only that's tolerable, but consider disconnecting it when the polling window ends (e.g. call observer.disconnect() in the same branch as clearInterval(timer)) so the debug flag doesn't leave a permanent whole-document observer running.
| debugForceBannerVisible?: boolean; | ||
| debugClearStateOnLoad?: boolean; |
There was a problem hiding this comment.
🤔 thinking — debugForceBannerVisible and debugClearStateOnLoad are added to the __tsjs_sourcepoint window type, but no TypeScript reads them — the entire debug behavior is implemented by the Rust-injected inline scripts. That's fine as config-shape documentation, but worth confirming it's intentional rather than a half-wired feature (e.g. behavior that was meant to live in script_guard.ts but ended up Rust-side). If they're purely documentation, a short comment saying so would prevent a future reader from assuming the TS layer consumes them.
Summary
Changes
crates/trusted-server-core/src/integrations/sourcepoint.rssrc/hrefroot-relative assets to/integrations/sourcepoint/cdn/...; adds debug-onlydebug_clear_state_on_loadanddebug_force_banner_visibleconfig and injector scripts; adds unit coverage.crates/js/lib/src/integrations/sourcepoint/index.tsdocs/guide/integrations/sourcepoint.mdCloses
Closes #792
Test plan
cargo test --workspacecargo clippy --workspace --all-targets --all-features -- -D warningscargo fmt --all -- --checkcd crates/js/lib && npx vitest runcd crates/js/lib && npm run formatcd docs && npm run formatcargo build --package trusted-server-adapter-fastly --release --target wasm32-wasip1fastly compute serve/integrations/sourcepoint/cdn/...; verified debug flags expose config and force-created banner containers visible locally.Checklist
unwrap()in production code — useexpect("should ...")tracingmacros (notprintln!)