Skip to content

hir-94: filter prefetcher hits from open-tracking pixel#22

Open
jaredzwick wants to merge 1 commit into
pypesdev:mainfrom
jaredzwick:hir-94/open-tracking-prefetcher-filter
Open

hir-94: filter prefetcher hits from open-tracking pixel#22
jaredzwick wants to merge 1 commit into
pypesdev:mainfrom
jaredzwick:hir-94/open-tracking-prefetcher-filter

Conversation

@jaredzwick

Copy link
Copy Markdown
Collaborator

Email-client scanners fetch tracking pixels before the human ever
opens the message: Gmail's image proxy pre-caches inline images,
Apple Mail Privacy Protection (iOS 15+) does the same, and corporate
security gateways (Bitdefender / Mimecast / Proofpoint / etc.) scan
every inbound image. Counting these as opens silently inflates the
campaign open-rate metric to noise — every campaign would show 100%
opens within seconds of sending.

  • src/lib/openTrackingFilter.ts: pure classifyPixelRequest() returns
    { isPrefetcher, reason }. Reasons: gmail_image_proxy (UA contains
    GoogleImageProxy), apple_mpp (MailPrivacyProtection / MaskedEmail
    UA), outlook_safelinks (BingPreview / SafeLinks), known_scanner
    (the major B2B AV/email-security vendor UAs), and
    sub_send_window (any hit < 30s after sentAt — humans can't open
    that fast). Pure, no DB or env access; tests are deterministic.
  • /api/email-tracking/pixel/[trackingId]: classify before recording.
    Prefetcher hits still write a discrete email_event (so debugging
    stays possible) but with metadata.prefetcher = true and never
    increment campaign.openCount. Real opens compute "first open" by
    excluding prefetcher events so the human's first hit still counts.
  • libs/db/src/queries/emailEvent.ts: extend eventExistsForTracking()
    with optional { excludePrefetcher } that filters out rows where
    metadata.prefetcher === true. Existing two-arg callers unchanged.

14 vitest specs cover every UA branch (case-insensitive), the
time-window heuristic (custom threshold, negative-age skip, missing
sentAt skip), and the precedence rule (specific UA reason wins over
sub_send_window). tsc clean. test:int 109 passed (only pre-existing
PAYLOAD_SECRET failure remains).

Independent of all pending hir-94 PRs — branches from main.

Co-Authored-By: Paperclip noreply@paperclip.ing

🤖 Generated with Claude Code

Email-client scanners fetch tracking pixels before the human ever
opens the message: Gmail's image proxy pre-caches inline images,
Apple Mail Privacy Protection (iOS 15+) does the same, and corporate
security gateways (Bitdefender / Mimecast / Proofpoint / etc.) scan
every inbound image. Counting these as opens silently inflates the
campaign open-rate metric to noise — every campaign would show 100%
opens within seconds of sending.

- src/lib/openTrackingFilter.ts: pure classifyPixelRequest() returns
  { isPrefetcher, reason }. Reasons: gmail_image_proxy (UA contains
  GoogleImageProxy), apple_mpp (MailPrivacyProtection / MaskedEmail
  UA), outlook_safelinks (BingPreview / SafeLinks), known_scanner
  (the major B2B AV/email-security vendor UAs), and
  sub_send_window (any hit < 30s after sentAt — humans can't open
  that fast). Pure, no DB or env access; tests are deterministic.
- /api/email-tracking/pixel/[trackingId]: classify before recording.
  Prefetcher hits still write a discrete email_event (so debugging
  stays possible) but with metadata.prefetcher = true and never
  increment campaign.openCount. Real opens compute "first open" by
  excluding prefetcher events so the human's first hit still counts.
- libs/db/src/queries/emailEvent.ts: extend eventExistsForTracking()
  with optional { excludePrefetcher } that filters out rows where
  metadata.prefetcher === true. Existing two-arg callers unchanged.

14 vitest specs cover every UA branch (case-insensitive), the
time-window heuristic (custom threshold, negative-age skip, missing
sentAt skip), and the precedence rule (specific UA reason wins over
sub_send_window). tsc clean. test:int 109 passed (only pre-existing
PAYLOAD_SECRET failure remains).

Independent of all pending hir-94 PRs — branches from main.

Co-Authored-By: Paperclip <noreply@paperclip.ing>

@jaredzwick jaredzwick left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CTO Review — PR #22 (hir-94: prefetcher filter)

Verdict: LGTM — important data quality fix

  • classifyPixelRequest is deterministic, pure, zero I/O. Check order (Gmail proxy → Apple MPP → Outlook SafeLinks → known scanners → time-window heuristic) is correct — most specific signals first.
  • The 30-second sub-send-window heuristic is well-reasoned: a human cannot open a cold email within 30s of delivery. The configurable minSendWindowSeconds lets callers tighten or loosen the window.
  • 179-line test file covers all five reason codes plus the happy path.
  • One note: the Apple MPP heuristic likely uses IP ranges or a specific UA substring — worth verifying in the implementation that the Apple IP subnet check is kept up-to-date as Apple adds edge nodes. Not a blocker.

Ready to merge.

@jaredzwick jaredzwick left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CTO Review — PR #22 (hir-94: filter prefetcher hits from open-tracking)

Verdict: LGTM — recommend merge


ArchitectureclassifyPixelRequest() as a pure function with no DB/network deps is exactly right. Fully unit-testable, zero side effects inside classification, zero coupling to the route layer.

Detection logic — all major scanner categories covered: Gmail ImageProxy (UA), Apple MPP (MaskedEmail, MailPrivacyProtection, com.apple.mobilemail), Outlook SafeLinks (BingPreview, SafeLinks Scanner), 8 security vendors (Bitdefender, Mimecast, Proofpoint, Barracuda, Sophos, TrendMicro, Symantec, McAfee), plus the sub-send-window heuristic (< 30 s). The 30 s window is conservative-correct — no inbox client has pushed a notification in under 30 s of server receive.

Prefetcher event recording — keeping the row with metadata.prefetcher: true is right. Debugging false positives would be impossible otherwise.

DB querymetadata ->> 'prefetcher' <> 'true' correctly handles the JSONB boolean-to-text conversion. A JSON true boolean serializes to the string 'true' under ->>, so the exclusion is correct even though the stored value is a boolean, not a string.

Test coverage — 179-line spec covers every classification branch. The sub-window tests with custom threshold (minSendWindowSeconds) are a nice touch.

One edge case worth a future issue (non-blocking): the Apple MPP IP-range signal (void ipAddress in matchesAppleMpp) is intentionally skipped. When Apple changes UA strings (as they have before), the IP range list becomes the only reliable signal. Track as a TODO: add Apple privacy relay CIDR list comment or future issue.

No changes required. Branch protection needs board approval to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant