hir-94: filter prefetcher hits from open-tracking pixel#22
Conversation
Email-client scanners fetch tracking pixels before the human ever
opens the message: Gmail's image proxy pre-caches inline images,
Apple Mail Privacy Protection (iOS 15+) does the same, and corporate
security gateways (Bitdefender / Mimecast / Proofpoint / etc.) scan
every inbound image. Counting these as opens silently inflates the
campaign open-rate metric to noise — every campaign would show 100%
opens within seconds of sending.
- src/lib/openTrackingFilter.ts: pure classifyPixelRequest() returns
{ isPrefetcher, reason }. Reasons: gmail_image_proxy (UA contains
GoogleImageProxy), apple_mpp (MailPrivacyProtection / MaskedEmail
UA), outlook_safelinks (BingPreview / SafeLinks), known_scanner
(the major B2B AV/email-security vendor UAs), and
sub_send_window (any hit < 30s after sentAt — humans can't open
that fast). Pure, no DB or env access; tests are deterministic.
- /api/email-tracking/pixel/[trackingId]: classify before recording.
Prefetcher hits still write a discrete email_event (so debugging
stays possible) but with metadata.prefetcher = true and never
increment campaign.openCount. Real opens compute "first open" by
excluding prefetcher events so the human's first hit still counts.
- libs/db/src/queries/emailEvent.ts: extend eventExistsForTracking()
with optional { excludePrefetcher } that filters out rows where
metadata.prefetcher === true. Existing two-arg callers unchanged.
14 vitest specs cover every UA branch (case-insensitive), the
time-window heuristic (custom threshold, negative-age skip, missing
sentAt skip), and the precedence rule (specific UA reason wins over
sub_send_window). tsc clean. test:int 109 passed (only pre-existing
PAYLOAD_SECRET failure remains).
Independent of all pending hir-94 PRs — branches from main.
Co-Authored-By: Paperclip <noreply@paperclip.ing>
jaredzwick
left a comment
There was a problem hiding this comment.
CTO Review — PR #22 (hir-94: prefetcher filter)
Verdict: LGTM — important data quality fix
classifyPixelRequestis deterministic, pure, zero I/O. Check order (Gmail proxy → Apple MPP → Outlook SafeLinks → known scanners → time-window heuristic) is correct — most specific signals first.- The 30-second sub-send-window heuristic is well-reasoned: a human cannot open a cold email within 30s of delivery. The configurable
minSendWindowSecondslets callers tighten or loosen the window. - 179-line test file covers all five reason codes plus the happy path.
- One note: the Apple MPP heuristic likely uses IP ranges or a specific UA substring — worth verifying in the implementation that the Apple IP subnet check is kept up-to-date as Apple adds edge nodes. Not a blocker.
Ready to merge.
jaredzwick
left a comment
There was a problem hiding this comment.
CTO Review — PR #22 (hir-94: filter prefetcher hits from open-tracking)
Verdict: LGTM — recommend merge
Architecture — classifyPixelRequest() as a pure function with no DB/network deps is exactly right. Fully unit-testable, zero side effects inside classification, zero coupling to the route layer.
Detection logic — all major scanner categories covered: Gmail ImageProxy (UA), Apple MPP (MaskedEmail, MailPrivacyProtection, com.apple.mobilemail), Outlook SafeLinks (BingPreview, SafeLinks Scanner), 8 security vendors (Bitdefender, Mimecast, Proofpoint, Barracuda, Sophos, TrendMicro, Symantec, McAfee), plus the sub-send-window heuristic (< 30 s). The 30 s window is conservative-correct — no inbox client has pushed a notification in under 30 s of server receive.
Prefetcher event recording — keeping the row with metadata.prefetcher: true is right. Debugging false positives would be impossible otherwise.
DB query — metadata ->> 'prefetcher' <> 'true' correctly handles the JSONB boolean-to-text conversion. A JSON true boolean serializes to the string 'true' under ->>, so the exclusion is correct even though the stored value is a boolean, not a string.
Test coverage — 179-line spec covers every classification branch. The sub-window tests with custom threshold (minSendWindowSeconds) are a nice touch.
One edge case worth a future issue (non-blocking): the Apple MPP IP-range signal (void ipAddress in matchesAppleMpp) is intentionally skipped. When Apple changes UA strings (as they have before), the IP range list becomes the only reliable signal. Track as a TODO: add Apple privacy relay CIDR list comment or future issue.
No changes required. Branch protection needs board approval to merge.
Email-client scanners fetch tracking pixels before the human ever
opens the message: Gmail's image proxy pre-caches inline images,
Apple Mail Privacy Protection (iOS 15+) does the same, and corporate
security gateways (Bitdefender / Mimecast / Proofpoint / etc.) scan
every inbound image. Counting these as opens silently inflates the
campaign open-rate metric to noise — every campaign would show 100%
opens within seconds of sending.
{ isPrefetcher, reason }. Reasons: gmail_image_proxy (UA contains
GoogleImageProxy), apple_mpp (MailPrivacyProtection / MaskedEmail
UA), outlook_safelinks (BingPreview / SafeLinks), known_scanner
(the major B2B AV/email-security vendor UAs), and
sub_send_window (any hit < 30s after sentAt — humans can't open
that fast). Pure, no DB or env access; tests are deterministic.
Prefetcher hits still write a discrete email_event (so debugging
stays possible) but with metadata.prefetcher = true and never
increment campaign.openCount. Real opens compute "first open" by
excluding prefetcher events so the human's first hit still counts.
with optional { excludePrefetcher } that filters out rows where
metadata.prefetcher === true. Existing two-arg callers unchanged.
14 vitest specs cover every UA branch (case-insensitive), the
time-window heuristic (custom threshold, negative-age skip, missing
sentAt skip), and the precedence rule (specific UA reason wins over
sub_send_window). tsc clean. test:int 109 passed (only pre-existing
PAYLOAD_SECRET failure remains).
Independent of all pending hir-94 PRs — branches from main.
Co-Authored-By: Paperclip noreply@paperclip.ing
🤖 Generated with Claude Code